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Get JET set 


Confusion over Brexit is adding to the anxiety of staff at a crucial UK research site 


for fusion energy. 


post-Brexit Britain was willing to pay to “fully associate” 

with Euratom, Europe’s nuclear agency. The details of the 
arrangement, similar to many that surround the controversial exit of 
the United Kingdom from the European Union, still have to be ironed 
out. And among those watching the negotiations with mounting 
concern are scientists at the Joint European Torus (JET) near Oxford, 
UK, who currently benefit greatly from Britain’s membership of the 
agency. The hundreds of researchers at JET receive annual funding 
of around €60 million (US$70 million), because Britain is part of 
Euratom. As it stands, that funding will cease at the end of this year. 

The JET facility serves as a key testing ground for ITER, the 
ambitious experimental fusion reactor being constructed in southern 
France. For the past three years, JET has been preparing a test run 
using a mixture of two hydrogen isotopes, deuterium and tritium, to 
mimic ITER’s planned eventual fuel mix. The test should give the best 
indication yet of the likely performance of ITER’s particular fusion 
method — which uses magnetic fields to confine a burning, ionized 
gas (or plasma) within a doughnut-shaped ring. The run should also 
help to guide the design of a prototype power plant to follow ITER. 

The JET experiment is clearly a crucial project that needs support. 
But with Brexit looming, where will that support come from? 

In theory, the EU can keep paying for JET in the short term. A pro- 
gress report on the Brexit negotiations, published late last year, says the 
United Kingdom can continue to pay into and participate in EU funding 
programmes until December 2020. And Britain has confirmed that it 
will keep paying its (much smaller) direct share of JET costs until then, 
too. Moreover, the European Commission (EC) has said that the EU 
should continue to fund JET; cash for the lab is thought to be included 
in the EC’s draft programme of fusion-research funding in 2019-20. 

But there is a snag. Before the EU can publicly confirm any plans to 
extend JET’s contract, a number of legislative hoops need to be jumped 
through. And the process is dragging. The problem lies in how fusion 
research is funded. A quirk of history means that Euratom’s research 
funding is allocated in 5-year periods — the current one ending in 
December 2018 — followed by 2-year top-ups that align the programme 
with the EU’s 7-year research-funding cycles (the latest of which ends 
in 2020). Although the top-up is a routine process, it requires the EU 
Council to approve new legislation, and that has yet to happen. 

Renewal of JET’s contract has gone down to the wire before, but the 
added uncertainty of Brexit is making staff nervous. It hardly helps 
that the site is repeatedly highlighted in the UK press as a potential 
casualty of Brexit, rarely with the caveat that its contract should be 
secure until the end of 2020. JET’s chief executive, Ian Chapman, told 
Nature last year that some top-level staff had already found positions 
elsewhere. The longer the process drags on, the less attractive JET will 
seem to researchers. 

One wrinkle has already been ironed out: draft text of the EU 
legislation has been tweaked to allow its fusion programme to include 


Pee Minister Theresa May conceded on 21 May that a 


JET, even if the facility sits outside existing funding schemes. Buta vote 
on the proposed regulation has been delayed by a decision to consult 
the European Parliament — largely a courtesy that has nothing to do 
with JET. And because the parliament is unlikely to offer an opinion 
until September, the final sign-off might now not come until Decem- 
ber. No legislation means no research programme, which means no 
JET contract. The result is that staff at the facility might not officially 

know whether they have a job on 1 January 


“Politicians 2019 until just days before — let alone be able 
should act to to do the important deuterium-tritium run. 

secure JET’s The facility itself is ploughing ahead 
funding.” with its preparations for the run, under the 


assumption that it will be funded for the next 
two years. It has no choice but to do so. The planned experiments are 
key to understanding how plasma will behave in reality, and nowhere 
else in the world can do the research before ITER is due to begin. 
Things will probably work out. But the prime minister’s concession 
regarding Euratom is yet another example of how much her govern- 
ment seems to be making up its Brexit policy as it goes along. Hoping 
that things will work out is no way to reassure anyone, let alone a basis 
for strategy. Politicians should act to secure JET’s funding for the next 
two years — and beyond. = 


Racing hearts 


Japan must show that a promising therapy for 
damaged hearts works as claimed. 


to push ahead with a promising treatment for heart disease 

that relies on stem cells. It could soon be made available under 
a fast-track approval system that the country put in place in 2014. 
Designed to speed access to regenerative therapies, the law allows 
prospective treatments to be marketed and used as long as they have 
been proved to be safe. Only a suggestion of efficacy is required — with 
more-convincing data supposed to be gathered retrospectively from 
patients who have been given the approved treatment. 

The system has its critics — Nature among them (see Nature 528, 
163-164; 2015). The latest move adds further concerns. 

The therapy is the work of a physician who was also the first to take 
advantage of the new law with a related treatment: Osaka University 
cardiac surgeon Yoshiki Sawa. There is no suggestion that Sawa has 
not followed the rules, set out by the Pharmaceuticals and Medical 
Devices Agency. He has. The issue is whether those rules are adequate 
and appropriate and have the welfare of patients at their heart. They do 


A s we report in a News story this week (page 619), Japan is set 
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not. Treatments of no proven efficacy are being sold to patients (who 
effectively subsidize the clinical trials to test them). They receive no 
refund if the therapy is subsequently found not to work. Patients also 
take risks: they undergo immunosuppression and the surgery itself. 

The new study takes induced pluripotent stem cells (iPS cells) that 
have been banked and characterized to ensure they are safe, and converts 
them to heart-muscle cells. These are then spread into a thin sheet that 
is attached to the weakened heart muscle. It is only the second clinical 
application of iPS cells and is generating excitement around the world. 
The problem is that the earlier treatment from Sawa — which is ongoing 
under the fast-track system — has yet to produce convincing results. 

In that treatment, approved in September 2015, patients received a 
sheet of muscle cells made from their own leg tissue, rather than from 
iPS cells. Called HeartSheet, the muscle sheet is attached to weakened 
heart muscle that has usually been damaged as a result of a heart attack 
or plaque build-up and is often the cause of heart failure. The scientists 
behind the treatment speculate that the muscle cells work by releasing 
growth factors, not by becoming supporting tissue themselves. Other 
researchers are sceptical. 

Now there are two new treatments being investigated for the same 
condition, and it’s impossible to know yet whether either will work or 
which might be best for individual patients. 

It makes sense that heart-muscle cells (used in the second study) 
might work better for the heart than leg-muscle cells (used in the first). 
Indeed, it was reported a decade ago that injecting muscle cells from 
the leg did not improve heart function (P. Menasché et al. Circulation 
17, 1189-1200; 2008). 

Most physicians hoping to treat heart disease by way of regenerative 


medicine have moved on to other strategies, with many looking to heart- 
muscle cells. That doesn’t mean HeartSheet cannot work, but it does 

raise the question of whether patients who are given it will benefit. 
Sawa himself has raised the issue. At a symposium last month 
touting the new iPS cell trial, he said “leg cells are not good, well, at 
least not enough". And the Osaka University web page announcing the 
iPS cell trial says that HeartSheet was found to be ineffective for more 
serious cases. Sawa told Nature that the cells work in some cases, but 
that he expects the new iPS cell therapy to be 


“Treatments more effective. 

of no proven All this places a question mark over how 
efficacy are the efficacy of HeartSheet can be proved as 
being sold to required. Half way through its scheduled 


patients.” 5-year plan, fewer than 10 patients — of the 60 
required by the terms of its approval — have 
received the treatment. If the trial doesn’t make 60, the health ministry 
told Nature, there would either be an extension or the ministry would 
try to make a decision on the basis of the available data. 

Some physicians have called for the HeartSheet tests to end and the 
data to be assessed before the new iPS cell study can begin. That might be 
an over-reaction, but pressure on the Japanese government is increasing. 
The government needs to move quickly to make sure that evaluation of 
the HeartSheet therapy is as rigorous as promised. As more treatments 
emerge, officials should make sure that — fast track or not — they have 
a valid claim to efficacy before being sold to patients. 

A therapy for heart disease could be the first iPS-cell clinical break- 
through that Japan so ardently desires. The country shouldnt sell short 
the promising technology or the patients who hope to benefit from it. m 


False testimony 


A lie-detection system being used by Spanish 
police highlights concerns about algorithms. 


your smartphone and, as a way of getting an insurance payout, 

falsely claiming that you had been mugged. Ten police forces in 
Murcia and Malaga had some extra help in spotting your deceit: a com- 
puter tool that analysed statements given to officers about robberies 
and identified the telltale signs of a lie. According to results published 
in the journal Knowledge-Based Systems, the algorithm was so good at 
pointing officers towards false claimants that detection of such offences 
in one week was an impressive 31 and 49 for the respective regions, 
up from an average of 3 and 12 closed cases over the entire month 
(L. Quijano-Sanchez et al. Knowl.-Based Syst. 149, 155-168; 2018). The 
government in Madrid is now rolling the system out across the country, 
and its developers are trying to apply its machine-learning methods to 
help detect other types of crime. 

In this case, the algorithm flagged up suspicious wording (based on 
a training set of statements known to be true and false), and left it up 
to the police to question suspects and get them to confess. A person, 
not a computer, made the final decision. Still, it's another example of 
the steady march of algorithms and artificial-intelligence (AI) systems 
into public life and decision-making — and that’s a trend that makes 
some people uncomfortable. 

Last week, the UK House of Commons Science and Technology 
Committee published a report, ‘Algorithms in decision-making; that 
summarizes many of those anxieties, and suggests some ways to allay 
them. It’s timely. Also last week, the UK government announced plans 
to make National Health Service (NHS) data available to companies 
and others to help build AI-based tools for diagnosing cancer. And 
the University College London Hospitals NHS Foundation Trust 
announced a partnership with the Alan Turing Institute, which works 


L: you live in southern Spain, last June was not a good time to lose 


on data science and AI, to find ways of improving health care in the 
NHS. It aims, for example, to use data sets of previous cases of people 
who arrive at hospital with abdominal pain, to develop a more effective 
triage system. 

Nature has raised concerns about the development of AI health-care 
algorithms before, particularly those that seek to diagnose disease (see 
Nature 555, 285; 2018). Although they show great promise, it is cru- 
cial that they are developed with proper scrutiny and review of the 
evidence. That has not always been the case so far. 

The UK parliamentary report also discusses a controversial and 
pertinent issue: how much could and should people who are affected 
by algorithms’ decisions be told about how the software works? This 
‘right to explanation is included in Europe's new data-protection laws, 
which came into force last week, although details on how this might 
change practice are unclear. At present, only France has committed to 
publishing the code behind algorithms used by the government. More 
should follow its lead: in evidence to the parliamentary inquiry, the 
UK government said its departments used such programmes widely; 
this includes HMRC, the department that calculates and collects tax. 

Some witnesses to the inquiry claimed that most people would not 
understand an explanation of how such software works. Others said 
that to open the ‘black box’ and lay out how an algorithm works is itself 
a difficult problem and one compounded by trade secrets. One option, 
as the report details, is to offer context that helps people to understand 
the algorithm's workings: to tell someone who has been refused a loan, 
for example, that the computer helping to make the decision required 
them to be earning £15,000 (US$20,000) more a year. 

Revealing such details does, of course, allow people to try to game the 
system. The Spanish police face this problem, too: in describing how 
their software detects fibs, they are handing advice to those who would 
lie to them in future about being robbed. This information is already 
in the public domain, so we're not breaking any confidences by repeat- 
ing them here: avoid mention of the brand names of what was stolen, 
don't say the attacker came from behind, and make your statement as 
long as possible. Still, the Spanish police have an incentive to publicize 
their system: they hope it will act as a deterrent. In this case, El Gran 
Hermano really is watching you. m 
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WORLD VIEW jenniicon oon 


requests to referee an article on the grounds that it lacks 
enough information for me to check the work. This can be a 
hard thing to explain. 

Our lack of a precise vocabulary — in particular the fact that we 
don’t have a word for ‘you didn’t tell me what you did in sufficient 
detail for me to check it’ — contributes to the crisis of scientific 
reproducibility. In computational science, ‘reproducible’ often 
means that enough information is provided to allow a dedicated 
reader to repeat the calculations in the paper for herself. In bio- 
medical disciplines, ‘reproducible’ often means that a different lab, 
starting the experiment from scratch, would get roughly the same 
experimental result. 

In 1992, philosopher Karl Popper wrote: “Science may be described 
as the art of systematic oversimplification — the art of discerning 
what we may with advantage omit.” What may 
be omitted depends on the discipline. Results 
that generalize to all universes (or perhaps do 
not even require a universe) are part of mathe- 
matics. Results that generalize to our Universe 
belong to physics. Results that generalize to 
all life on Earth underpin molecular biology. 
Results that generalize to all mice are murine 
biology. And results that hold only for a par- 
ticular mouse in a particular lab in a particular 
experiment are arguably not science. 

Communicating a scientific result requires 
enumerating, recording and reporting those 
things that cannot with advantage be omitted. 
This harks back to the idea of science as a way to build knowledge 
through careful experimentation. Ushering in the Enlightenment 
era in the late seventeenth century, chemist Robert Boyle put forth 
his controversial idea of a vacuum and tasked himself with providing 
descriptions of his work sufficient “that the person I addressed them 
to might, without mistake, and with as little trouble as possible, be able 
to repeat such unusual experiments”. 

Much modern scientific communication falls short of this standard. 
Most papers fail to report many aspects of the experiment and analysis 
that we may not with advantage omit — things that are crucial to 
understanding the result and its limitations, and to repeating the work. 
We have no common language to describe this shortcoming. I’ve 
been in conferences where scientists argued about whether work was 
reproducible, replicable, repeatable, generalizable and other ‘-bles, and 
clearly meant quite different things by identical terms. Contradictory 
meanings across disciplines are deeply entrenched. 

The lack of standard terminology means that we do not clearly 
distinguish between situations in which there is not enough informa- 
tion to attempt repetition, and those in which attempts do not yield 
substantially the same outcome. To reduce confusion, I propose an 
intuitive, unambiguous neologism: ‘preproducibility. An experiment 


i rom time to time over the past few years, I’ve politely refused 


SCIENCE 
SHOULD BE 


‘SHOW ME’, 


NOT 


‘TRUST ME’. 


No reproducibility 
without preproducibility 


Instead of arguing about whether results hold up, let’s push to provide 
enough information for others to repeat the experiments, says Philip Stark. 


or analysis is preproducible if it has been described in adequate detail 
for others to undertake it. Preproducibility is a prerequisite for 
reproducibility, and the idea makes sense across disciplines. 

The distinction between a preproducible scientific report and 
current common practice is like the difference between a partial list of 
ingredients and a recipe. To bake a good loaf of bread, it isn’t enough to 
know that it contains flour. It isn’t even enough to know that it contains 
flour, water, salt and yeast. The brand of flour might be omitted from 
the recipe with advantage, as might the day of the week on which 
the loaf was baked. But the ratio of ingredients, the operations, their 
timing and the temperature of the oven cannot. 

Given preproducibility — a ‘scientific recipe — we can attempt to 
make a similar loaf of scientific bread. If we follow the recipe but do 
not get the same result, either the result is sensitive to small details 
that cannot be controlled, the result is incorrect or the recipe was 
not precise enough (things were omitted to 
disadvantage). 

Depending on the discipline, preproducibility 
might require information about materials 
(including organisms and their care), instru- 
ments and procedures; experimental design; raw 
data at the instrument level; algorithms used to 
process the raw data; computational tools used 
in analyses, including any parameter settings or 
ad hoc choices; code, processed data and soft- 
ware build environments; or analyses that were 
tried and abandoned. 

Peer review is hamstrung by lack of pre- 
producibility: referees and editors cannot 
provide serious quality control unless they are given enough 
information. Preproducibility will bring us closer to the ideals of the 
Enlightenment, providing crucial evidence about whether a reported 
result is correct and about how far the result can be generalized. 

Science should be ‘show me; not ‘trust me’; it should be ‘help me if 
you can, not ‘catch me if you car. If I publish an advertisement for my 
work (that is, a paper long on results but short on methods) and it’s 
wrong, that makes me untrustworthy. If] say: “here’s my work” and 
it's wrong, I might have erred, but at least Iam honest. If you and I get 
different results, preproducibility can help us to identify why — and 
the answer might be fascinating. 

Just as I have pledged not to review papers that are not 
preproducible, I have also pledged not to submit papers without 
providing the software I used, and — to the extent permitted by law 
and ethics — the underlying data. I urge you to do the same. The 
commitment that Boyle made to the scientific community is even 
more crucial today. = 


Philip B. Stark is a professor of statistics who specializes in inference 
at the University of California, Berkeley. 
e-mail: stark@stat. berkeley.edu 
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Cool lab launch 
NASA%s Cold Atom Laboratory 
arrived at the International 
Space Station on 24 May. 

The US$83-million mission 

is designed to exploit the 
microgravity of space to create 
the coldest point in the known 
Universe. Once installed, the 
lab will cool clouds of atoms 

to a few billionths of a degree 
above absolute zero so that 
they forma single quantum 
state known as a Bose-Einstein 
condensate. Scientists will use 
these clouds to probe quantum 
phenomena in ways that are 
impossible on Earth. The lab 
launched on 21 May from 

the Wallops Flight Facility in 
Virginia, on a supply rocket. 


Instrument fail 


The key instrument on 

the latest next-generation 

US weather satellite is 
malfunctioning, the National 
Oceanic and Atmospheric 
Administration (NOAA) said 
on 23 May. The cooling system 
for the Earth-observing 
imager did not start up as 
planned after the March 
launch of the GOES-17 
satellite. The problem affects 
several infrared and near- 
infrared detectors, rendering 
them too warm and degrading 
the quality of the data that they 
collect, particularly at night. 
This will affect the satellite’s 
ability to monitor storms and 
other weather phenomena. 
The agency is trying to find 

a workaround to restore the 
quality of the information. 


Gravity mission 

A pair of US-German gravity 
satellites launched into space 
on 22 May from Vandenberg 
Air Force Base in California. 
The twin spacecraft will 
monitor how water moves 
around the planet. The 
Gravity Recovery and Climate 
Experiment Follow-On 
(GRACE-FO) mission picks 


The news in brief 


Dark-matter detector draws a blank 


The world’s largest experiment intended to 
detect weakly interacting massive particles 
(WIMPs) has come up empty-handed after 
collecting data for nearly a year. KENON1T 

is located 1.4 kilometres underground at the 
Gran Sasso National Laboratory in central Italy. 
The experiment looks out for the tiny flashes of 
light that should be given off when WIMPs — a 
popular candidate for dark matter, which is 
thought to make up 85% of the Universe's 


up from the first GRACE 
spacecraft, which operated 
between 2002 and 2017 and 
provided crucial insights into 
Earth's water cycle and other 
changes to the planet's mass. 
The new satellites will measure 
shifts in surface gravity, which 
can occur because of processes 
such as ice loss from polar 

ice sheets or groundwater 
extraction for irrigation. 

The first scientific data from 
the GRACE-FO mission 

are expected in about seven 
months. 


Nobel Prize centre 


The construction of the 
Nobel Center’s new home in 
Stockholm has been put on 
hold after a court opposed 
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thought. 


the building’s design. The 
planned 1.2-billion-kronor 
(US$137-million) bronze- 
clad structure is intended to 
host the annual Nobel prize 
ceremonies. But Sweden's 
Land and Environment Court 
ruled on 22 May that the 
building would clash with 
Stockholms historic waterfront 
environment. Nobelhuset, the 
company that runs the centre, 
says it is waiting to see whether 
the City of Stockholm, which 
is developing the project, will 
appeal against the decision. 


North Korea summit 
US President Donald Trump 
has cancelled his planned 
summit with North Korean 
leader Kim Jong-un, which 
had been set for 12 June in 
Singapore. Trump announced 


matter — collide with atoms in 1,300 kilograms 
of cold liquid xenon. On 28 May, researchers 
from the XENONIT collaboration reported 

at seminars held simultaneously at Gran 

Sasso and at CERN, Europe’s particle-physics 
laboratory in Geneva, Switzerland, that no such 
flashes were detected. The data suggest that 
WIMPs — if they exist — interact even more 
weakly with ordinary matter than previously 


his decision on 24 May, 

citing recent remarks bya 
North Korean official who 
had described comments by 
US vice-president Mike Pence 
as “ignorant and stupid”. US 
and North Korean officials 
had been expected to discuss 
North Korea’ efforts to 
develop nuclear weapons. 
However, days after the 
cancellation, media reports 
suggest that both sides were 
still preparing for the meeting 
to go ahead. Also on 24 May, 
North Korea said that it had 
dismantled its only known 
nuclear test site, Punggye-ri. 
The government invited 
reporters from several foreign 
news outlets to watch the 
demolition but did not allow 
independent nuclear monitors 
to attend. 


XENON 
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POLICY 


The right to try 

The US House of 
Representatives has passed 

a controversial bill granting 
some critically ill people the 
right to access experimental 
treatments that do not have 
approval from the US Food 
and Drug Administration. The 
bill, voted on by the House on 
22 May, had already passed the 
Senate and will now move on to 
President Donald Trump, who 
is expected to sign it into law. 
Several medical and patient- 
advocacy groups oppose the 
‘right to try’ legislation, saying 
that it might offer people false 
hope and expose them to 
unnecessary risks. 


Bear hunt approved 


Wyoming officials approved 
regulations for the hunting of 
up to 23 grizzly bears (Ursos 
arctos horribilis, pictured) in 
the area around Yellowstone 
National Park. Grizzlies in 
the park are still off-limits 

to hunters, as are those in 
nearby Grand Teton National 
Park. The decision came on 
23 May, less than a year after 
the US government removed 
grizzly bears in the greater 
Yellowstone ecosystem from 
the endangered-species 

list. The hunt proposal, put 
forward by the Wyoming 
Game and Fish Department in 
February, stirred controversy 
over whether this population 
of bears had recovered from 


TREND WATCH 


SOURCE: G. S. PATIENCE ET AL. PREPRINT AT 
BIORXIV AT HTTP://DOILORG/CQC5 (2018) 


Opinions about which research 
contributions deserve authorship 
credit on a paper vary markedly 


across scientific disciplines — and 


even within fields. In a survey of 
nearly 6,000 researchers, most 
said they would grant authorship 
for data interpretation or 
manuscript drafting. But nearly 
half would almost never or only 
sometimes do so for people who 
secured their funding. Social 
scientists tended to assign less 
value to proposing ideas than did 
researchers in the pure, applied 
and natural sciences. 


decades of hunting and habitat 
loss. In April, 73 scientists 
asked Wyoming Governor 
Matt Mead to halt the hunt 
until independent experts 
could review the proposal. 


Brexit plan emerges 
The United Kingdom has called 
for a close partnership with the 
European Union on science 
and innovation after Brexit. 

A document published on 

23 May by the UK government 
department overseeing 
negotiations to leave the EU 
outlines plans for a future 
science and innovation pact. 
The proposal includes access 
to EU-funding programmes 
and research infrastructure. It 
also calls for wider agreements 
on data sharing, intellectual 
property and the movement 

of researchers around the EU. 
The document, which will be 
used in discussions with EU 
officials, also makes it clear 
that the United Kingdom is 
willing to respect the remit 

of the Court of Justice of the 
European Union in relation to 


A QUESTION OF CREDIT 


its participation in relevant EU 
science programmes. Prime 
Minister Theresa May initially 
said she did not want the court 
to have any jurisdiction in the 
United Kingdom after Brexit. 


HEALTH 


Nipah vaccine 


Two US biotechnology firms 
will receive millions of dollars 
to develop a vaccine against 
the rare but deadly Nipah 
virus. The World Health 
Organization lists the infection, 
transmitted to humans by 
fruit bats (Pteropus spp.), as 

a priority for research and 
development. On 24 May, 

the Coalition for Epidemic 
Preparedness Innovations, a 
multimillion-dollar initiative 
to develop and stockpile 
vaccines, announced that it 
would give US$25 million 

to Profectus BioSciences 

and another company, 
Emergent BioSolutions, 
which will provide technical 
and manufacturing support. 
There is currently no approved 


A survey of nearly 6,000 researchers across scientific fields reveals 
which contributions to research papers tend to attract authorship 


credit, and which don't. 


@ Almost always @ Usually @ Often ™ Sometimes 


Almost never 


Contributions that most researchers say deserve authorship credit 


Drafting paper 
Interpreting results 


Analysing data 


Contributions that fewest researchers say deserve authorship credit 


Troubleshooting 
Third-party analysis 


Correcting grammar 
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SEVEN DAYS | THIS WEEK | 


vaccine or treatment for Nipah 
virus, which kills about 75% of 
those infected. An outbreak in 
southern India this month has 
killed at least ten people. 


Telescope first 


A pioneering telescope set-up 
— the first optical telescope 
permanently dedicated to 
tracking the gaze of a radio 
telescope — launched in 
South Africa on 25 May. The 
MeerLICHT instrument 
(Dutch for ‘more light’) in 
Sutherland is linked to South 
Africa's 64-dish MeerKAT 
radio telescope near Carnarvon 
and will observe for the 

next 5 years. By pointing 

the telescopes at the same 
part of sky at the same time, 
researchers hope to be able 

to study astronomical events 
in many wavelengths, which 
could reveal the causes 

of enigmatic, short-lived 
astronomical events such as fast 
radio bursts. MeerLICHT will 
focus on such transient events 
and on detecting possible 
sources of gravitational 
waves. The €1-million 
(US$1.2-million) project 

is acollaboration between 
Dutch, South African and UK 
institutions. 


PEOPLE 
New AAS president 


Australian biochemist and 
molecular biologist John 

Shine is the new president of 
the Australian Academy of 
Science (AAS). Shine, a pioneer 
in human gene cloning, most 
recently studied the genetics of 
inherited kidney disorders at 
the Garvan Institute of Medical 
Research in Sydney, where he 
was also executive director 
from 1990 to 2011. For the 

past six and a half years, he has 
served as chair of Australian 
biotechnology giant CSL, 
headquartered in Melbourne. 
Shine started his five-year 

term at the AAS on 24 May, 
replacing chemist Andrew 
Holmes. 


> NATURE.COM 
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Ata press conference in Tokyo, cardiac surgeon Yoshiki Sawa announces plans to use tissue derived from induced pluripotent stem cells to treat heart disease. 


Reprogrammed stem cells 
approved to mend hearts 


Japanese study is only the second application of induced pluripotent stem cells in people. 


DAVID CYRANOSKI 


treat people who have heart disease with 

cells produced by a revolutionary repro- 
gramming technique. The study is only the sec- 
ond clinical application of induced pluripotent 
stem (iPS) cells. These are created by inducing 
the cells of body tissues such as skin and blood 
to revert to an embryonic-like state, from which 
they can develop into other cell types. 

On 16 May, Japan’s health ministry gave 
doctors the green light to take wafer-thin 
sheets of tissue derived from iPS cells and graft 
them onto diseased human hearts. The team, 
led by cardiac surgeon Yoshiki Sawa at Osaka 
University, says that the tissue sheets can 
help to regenerate the organ’s muscle when it 


. cientists in Japan now have permission to 


becomes damaged, a symptom of heart disease 
that can be caused by a build-up of plaque or 
bya heart attack. 

“Tt will excite worldwide attention, as many 
groups are working in the same direction,’ 
says Thomas Eschenhagen, a pharmacologist 
at the University of Hamburg in Germany and 
chair of the German Centre for Cardiovascular 
Research. 

The treatment will initially be given to three 
people over the next year. The team will then 
seek approval to conduct a clinical trial in 
around ten patients. If it proves safe, the treat- 
ment could then be sold commercially under 
Japan's fast-track system for regenerative 
medicine. 

The system, introduced in 2014, aims to 
speed the availability of potentially life-saving 


procedures. But critics say the system is flawed 
because it allows treatments to be on sale to 
patients before sufficient data have been 
collected showing that the procedures work. 


MENDING BROKEN HEARTS 
In their technique, Sawa and his colleagues use 
iPS cells to create a sheet of 100 million heart- 
muscle cells. From studies in pigs, the team 
has shown that grafting these sheets of cells — 
each 0.1 millimetres thick and 4 centimetres 
long — onto a heart can improve the organ’s 
function. Sawa says that the cells do not seem 
to integrate into the heart tissue. He thinks that 
instead they release growth factors that help to 
regenerate the damaged muscle. 

Scientists say one advantage of the sheets is 
that they create their own cellular matrix, > 
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> and can maintain their structure 
without the need for scaffolding made from 
foreign materials, a feature of some other 
engineered tissues. 

“It is a very elegant and clever way to 
deliver cells,” says Philippe Menasché, a 
cardiac surgeon at the Georges Pompidou 
European Hospital in Paris, who has also 
experimented with making tissue sheets. 

Pharmacologist Wolfram-Hubertus 
Zimmermann at the University Medi- 
cal Centre Gottingen in Germany, who is 
also developing an iPS treatment for heart 
disease, says that the latest trial is based 
on work conducted by Sawa and other 
colleagues in Japan over the past 15 years. 

Once Sawa’s team has treated its three 
patients, it will apply to conduct a clinical 
trial involving a further seven to ten people. 
If the treatment proves to be safe, and shows 
some signs of working, it can be approved 
for sale under the accelerated system. This 
allows researchers to bypass expensive 
large-scale clinical trials aimed at proving 
efficacy, and instead to use small pilot trials 
to show that the therapy is safe and shows 
an indication of efficacy. 

But some researchers say the bar for 
approving therapies for commercial use is 
too low. Even ifthe cells are found to be safe, 
there are risks associated with any surgery, 
and patients could give up other thera- 
pies for a treatment that might not work. 
Ethicists and regulators say the benefits of 
any new therapy must outweigh the risks. 

Yoshiki Yui, a cardiologist at Kyoto 
University, says that, as well as meeting the 
requirements for safety, researchers should 
show that their treatment is effective, which 
would require testing it in larger numbers of 
people than are currently required. The eval- 
uation process should also use randomized, 
controlled clinical trials, the gold standard 
for demonstrating efficacy in medical 
research, he says. 

The iPS-cell therapy has potential, Yui 
adds, but under the current approval system, 
“we wont know if it works or not” because it 
wont have been tested in a controlled trial. 
“The biggest problem is there's no adequate 
system of evaluation in Japan,” he says. 

A spokesperson for the health ministry 
told Nature that the current approval sys- 
tem is sufficient because researchers must 
still show that a treatment works even if it 
has been approved for commercial use. 

Sawa agrees that a control group is 
important for proving efficacy, but notes 
that he is abiding by Japan’s rules, which 
don't require this before a treatment is made 
commercially available. 

He says the health ministry’s approval 
is an acknowledgement that the therapy 
“is scientifically and ethically justified” 
to be tested in patients. “Whether it 
really works, [we] have to find out now,” 
he adds. m SEE EDITORIAL P.6I! 
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Muography makes 


its mark 


Little-known particles called muons are helping to map the 
insides of pyramids, and to spot missing nuclear waste. 


BY ELIZABETH GIBNEY 


r | Vhe muon is going mainstream. The 
particle, a heavy version of the electron 
that constantly rains down on every 

square centimetre of Earth, is little known 
outside particle physics — but last year it 
helped archaeologists to make a stunning 
discovery of a previously unknown chamber 
in Egypt's Great Pyramid. 

Volcanologists and nuclear engineers are 
also finding new uses for muography, which 
harnesses muons to probe the innards of dense 
structures. The first companies are looking to 
cash in. 

“The discovery in the pyramids last year 
has really put muography on the map,’ says 
David Mahon, a physicist at the University of 
Glasgow, UK, who co-organized an interna- 
tional meeting called Cosmic-ray Muography, 
sponsored by the Royal Society and held on 
14-15 May in Newport Pagnell, UK. 


MUONS ARE EVERYWHERE 

Muons have the same negative charge as 
electrons but 200 times the mass. They are 
made when high-energy particles called 
cosmic rays slam into atoms in Earth's atmos- 
phere. Travelling at close to the speed of light, 
muons shower Earth from all angles. Every 
hand-sized area of the planet is hit by roughly 
one muon per second, and the particles can 
pass through hundreds of metres of solid mate- 
rial before they are absorbed. 

Their omnipresence and penetrating power 
makes muons perfect for imaging large, dense 
objects without damaging them, says Cristina 
Carloganu, a physicist at the Clermont- 
Ferrand Physics Laboratory in France. The 
denser materials are, the more energy they 
absorb from the particles, so physicists can 
track how often muons of different energies 
reach detectors placed around a target, and 
compare that with the expected rate without an 
obstacle, to build up a 3D profile of the density 
of the interior. 

Physicists have been experimenting 
with the technique since the 1950s, includ- 
ing an unsuccessful search for hidden 
chambers in the second-largest pyramid 
at Giza (L. W. Alvarez et al. Science 167, 
832-839; 1970). But the room-sized detec- 
tors were expensive and impractical, says 


Raffaello D'Alessandro, a particle physicist 
at the University of Florence, Italy, and a co- 
organizer of the muography meeting. They 
could weigh more than 10 tonnes and relied 
on muons ability to ionize particles of some- 
times explosive gases. 


CHANGE OF TACK 

Ways to track the paths of charged particles 
more precisely — developed at facilities such 
as CERN, Europe's particle-physics labora- 
tory near Geneva, Switzerland — have made 
for safer, smaller and more-sensitive muon 
detectors. They can now be as compact as a 
few square metres and can run on solar pan- 
els, making it possible to take them to remote 
field sites. 

Volcanoes have become a popular target for 
the technique, thanks to pioneering work by 
researchers in Japan. 
Mapping lava chan- 


it9 bd 

It’sa arts nels, which absorb 
very specialist less energy from 
technique that muons than does the 
comes from the dense surrounding 


high-energy- 


physics aod rock, could one day 


help to predict erup- 
tions, says Carloganu. 
This year, researchers will try to image the 
solidified plug of lava inside Italy’s Mount 
Vesuvius. 

Smaller devices are also being used in 
archaeology, says Giulio Saracino, a physi- 
cist at the University of Naples Federico II in 
Italy. He and his team have mapped cavities 
and tunnels under Mount Echia, a settle- 
ment in Naples that has been occupied since 
the eighth century Bc. They also plan to look 
for a rumoured aqueduct beneath the nearby 
ancient city of Cumae. 

A spate of commercial applications for 
muography — five were presented at the 
conference — probe smaller samples, such as 
drums of nuclear waste. These applications 
often use a slightly different technique, which 
tracks how muons change direction when they 
hit atomic nuclei in a material. 

By placing detectors on both sides of a 
sample, physicists can recreate a particle’s tra- 
jectory. And because the angle of deflection 
correlates with the density of the substance 
the muon hits, studying these paths can help 
to create a density map of the material being 


Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


SCAN PYRAMIDS MISSION 


418) 
he) 


iy 
i 


op 


=e 


Muon detectors are now small enough to take to field sites such at the Great Pyramid of Giza in Egypt. 


probed. Engineers can use this method to spot 
stray fragments of uranium inside containers 
of nuclear waste, even if it is encapsulated in 
concrete or steel. 

“To get information about what is deep in 
the centre, muons are pretty much the only 
thing that can do that,’ says Mahon. He directs 
a firm called Lynkeos Technology based in 
Glasgow, which will start imaging nuclear- 
waste samples next month at the UK National 
Nuclear Laboratory at Sellafield. 

In the United States, trials at the Los Alamos 


National Laboratory in New Mexico have 
found that similar technology can spot where 
fuel rods have been removed from casks of 
spent fuel. Just four stolen fuel rods would 
provide enough plutonium to build a primitive 
nuclear weapon, Los Alamos physicist Chris- 
topher Morris told the conference. 

Israeli firm Lingacom, based in Tel Aviv, is 
also investigating using the technique in secu- 
rity screening, for example at border crossings, 
to inspect containers for smuggled nuclear 
material. Other firms plan to use muography 


to track the wear of oil-industry pipelines and 
search for minerals in old mines. 

But in many academic fields, the technology 
is still greeted with shrugs and quizzical looks. 
Despite finds such as the Great Pyramid’s hid- 
den chamber, the technology is still relatively 
unproven. “It’s a new, very specialist technique 
that comes from the high-energy-physics 
world,” says Saracino. “The first time I say to 
geologists that we have muon technology, they 
say, ‘What are muons?’ They are fascinated, but 
also alittle bit wary.” m 


DIVERSITY 


Fewer African American 
men going into medicine 


Diversity advocates seek strategies to correct alarming decrease. 


BY GIORGIA GUGLIELMI 


ven as US diversity initiatives try to 
E increase the representation of minority 

ethnic groups in science and medicine, 
the proportion of black men pursuing such 
careers is reaching historic lows. In 1986, 
57% of African American medical-school 
graduates were men — but by 2015 that share 
had dropped to just 35%, even as the total 


number of black graduates had increased. 
Given the extent of racism and discrimina- 
tion, “it’s difficult for black males to be able to 
progress’, says Cato Laurencin, a surgeon-scien- 
tist at the University of Connecticut in Farming- 
ton. Laurencin chaired a workshop on the issue 
that was convened last November by the US 
National Academies of Sciences, Engineering, 
and Medicine and the Cobb Institute, a non- 
profit group in Washington DC that studies 


health disparities and racism in medicine. 

A report from the workshop, released on 
18 May, examines factors that contribute to the 
growing absence of black men in science and 
medicine, as well as current models and strat- 
egies for boosting participation (see go.nature. 
com/2lo4p3b). 

Although more African American students 
attend medical schools today than 30 years ago, 
the increase is due to greater numbers of 
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> black women training to be physicians. 
The proportion of men among African- 
American medical students decreased by 
more than 20% over the same period. Data 
from the Association of American Medical 
Colleges show that, in 2015, 41% of black 
male applicants were accepted into medical 
school — the lowest rate across all genders 
and ethnicities. “This is a crisis that affects 
not only blacks, but also our national ability 
to have excellence in science and medicine,” 
Laurencin says. 

Racial diversity in the medical profes- 
sions can help to address health inequali- 
ties. Studies have shown that people from 
minority groups receive better care when 
their physicians have similar backgrounds. 

“Having racial diversity leads to not just 
more doctors, but also better-prepared 
doctors who go into communities of col- 
our,” says Liliana Garces, an education 
researcher and legal scholar at the Univer- 
sity of Texas at Austin. She adds that one 
promising strategy for increasing diversity 
in medical schools is reducing the admis- 
sion procedure’s emphasis on standardized 
tests, which “don't end up capturing the 
student's potential, and only contribute to 
more racial inequities in the student body”. 

Ross University School of Medicine in 
Portsmouth, Dominica, accepts students 
from under-represented minorities with 
lower standardized test scores and grade 
point averages than white applicants. The 
university — which has campuses in Dom- 
inica and the United States — gives these 
students educational support during the 
first semesters of medical school and con- 
nects them with a mentor from a similar 
background. 

Environments where black men can build 
a community help to improve graduation 
rates, Laurencin says. And programmes that 
give financial support to undergraduate stu- 
dents of colour and provide early exposure to 
research increase representation in science, 
technology, engineering and mathematics 
PhD programmes. 

But Freeman Hrabowski, president of the 
University of Maryland, Baltimore County, 
which runs one such programme, notes 
that universities and medical schools need 
funding to expand these efforts. “With- 
out funding,” he says, “there is no serious 
commitment.” m 
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A demonstration power plant run by NET Power in Houston, Texas. 


Zero-emissions plant 


begins key 


tests 


Start-up firm NET Power is developing a new approach to 


capturing and storing carbon. 


BY JEFF TOLLEFSON 


team of engineers in La Porte, Texas, 
A™= spent the past several weeks run- 
ning tests on a prototype power plant 
that uses a stream of pure carbon dioxide 
— not air — to drive a turbine. If the zero- 
emission technology developed by NET Power 
in Durham, North Carolina, succeeds, it could 
help to usher in an era of clean power from 
fossil fuels. 
The company broke ground on the roughly 


25-megawatt plant in March 2016, after 
raising US$140 million for the project. It com- 
pleted construction last year. Itis now running 
a battery of tests on the combustor that powers 
the plant, a one-of-a-kind device built by the 
Japanese industrial giant Toshiba. If the tests 
go as planned, NET Power will hook up the 
turbine and begin generating electricity later 
this year. 

Officials say everything is running smoothly 
so far. “We're still smiling,” says chemical engi- 
neer Rodney Allam, the facility's lead designer. 
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— Allam is now a partner with 8 Rivers, a 
technology company in Durham that co-owns 
NET Power with Exelon, a major electricity 
6 provider in Chicago, Illinois, and McDermott 
« International, an energy-services company in 
= Houston, Texas. 

What separates the La Porte facility from 
a standard power plant is the CO, cycle at its 
core. A conventional power plant burns fossil 
fuels to generate steam that drives a turbine — 
and it also emits CO, as a by-product. 

By contrast, NET Power will drive its turbine 
with a loop of hot, pressurized CO). The first 
step is to fill the system with CO,, which must 
then be heated to drive the turbine — much 
like a conventional power plant heats water to 
create steam. 

The combustor then ignites a mixture of nat- 
ural gas and oxygen, which is extracted from 
the atmosphere in a separate facility. This heats 
up the CO, in the loop that drives the turbine, 
but it also produces further CO, that must be 
siphoned off to keep the system in balance. 
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ENERGY ECONOMICS 
The result is a stream of pure CO, that can be 
buried or put into a pipeline — rather than the 
atmosphere — at almost no cost. That gives it 
an edge over existing technologies for strip- 
ping CO, out ofa conventional power plant’s 
exhaust; these drive up costs while sapping 
around 20% of the plant’s power. 

Allam says that, if all goes well, NET 
Power’s technology will produce electricity 
as cheaply and efficiently as a conventional, 


modern gas-fired power plant — and earn 
extra revenue by other means. For instance, 
oil companies might buy the plant’s excess 
CO, and pump it into their wells to boost 
oil production. NET Power could also sell 
nitrogen and argon produced by the plant’s 
air separator. 

A coal-fired power plant in Houston that is 
equipped with a competing CO,-capture tech- 
nology is already delivering the gas it collects 
to anearby oil field. The $1-billion Petra Nova 
project came online in January 2017. It uses an 

amine-based solvent 


“If the plant . to capture about one- 
does everything third of the emissions 
they say, it ’s : from a single power- 
hard to imagine generating unit — up 
why you would to 1.6 million tonnes 
want to build of CO, annually. 

atraditional But the project —a 
power plant.” joint venture between 


NRG Energy in 
Princeton, New Jersey, and JX Nippon Oil and 
Gas Exploration in Tokyo — depended on 
both a $190-million grant from the US Depart- 
ment of Energy and additional oilfield revenue 
to turn a profit, says Daniel Cohan, an atmos- 
pheric scientist at Rice University in Houston. 
By contrast, he notes NET Power’s claim that 
its power plant will turn a profit even before it 
begins selling CO,,. 

“If the plant does everything they say, it’s 
hard to imagine why you would want to build 
a traditional power plant,’ Cohan says. “But 
there are still a lot of ifs ahead” 
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One major challenge will be ensuring 
proper combustion of oxygen and methane 
in the presence of CO,, which normally acts 
as a fire extinguisher. NET Power is several 
months behind schedule on this task, but pro- 
ject officials say that was the result of Toshiba's 
decision to test the plant’s combustor on site 
rather than sending it to an independent test 
facility; that meant installing and reconfigur- 
ing equipment at the otherwise complete plant. 

Once the project begins producing electric- 
ity, NET Power engineers must also show that 
the plant operates as efficiently as advertised, 
says Howard Herzog, who studies carbon 
capture and sequestration at the Massachu- 
setts Institute of Technology in Cambridge. 
The challenge, he says, will be to address the 
inevitable problems that arise when engineers 
are building the first-of-a-kind facility without 
sacrificing energy efficiency or driving up 
costs. 

NET Power officials say they are ready 
to take advantage of recently expanded US 
government tax credits for carbon capture 
and sequestration projects — beginning with 
a proposed 300-megawatt plant that could 
be operational by 2021. But the company’s 
chief executive, Bill Brown, says the firm 
isn't reliant on subsidies, and is already seek- 
ing customers and manufacturing partners 
abroad. It is also looking at potential markets 
for CO,, which could soon become a cheap 
chemical feedstock. 

“We don't like to rely on policy around here,’ 
Brown says. “We like to rely on science.” = 


INTELLECTUAL PROPERTY 


Rush to protect billion- 
dollar antibody patents 


A US federal court decision has left biotech working to preserve intellectual—property rights. 


BY HEIDI LEDFORD 


| J niversities and biotechnology 
companies in the United States are 
scrambling to protect some of their 
most valuable assets: patents on antibodies. 
These immune-system molecules form the 
basis of drugs that rake in about US$100 billion 
per year. But securing intellectual-property 
rights to antibodies has become much more 
difficult, under guidelines released in Febru- 
ary by the US Patent and Trademark Office 
(USPTO). 
The revised rules come after a federal court 


decision last October narrowed the scope of 


antibody patents — including those that have 


already been handed out. “People are still try- 
ing to make sense of it? says Ulrich Storz, a 
patent attorney at Michalski Hiittermann & 
Partner in Diisseldorf, Germany. “These were 
very powerful patents.” 

Storz and others will discuss the impli- 
cations of the shift on 6 June as part of a 
panel at the Biotechnology Innovation 
Organization annual meeting in Boston, 
Massachusetts. 


BROAD PROTECTIONS 

Antibodies are proteins made by the 
immune system that bind to a specific target, 
such as a protein produced by a microbe, to 
interfere with its ability to promote disease. 


This has made them powerful drugs against 
some illnesses. 

Therapeutic antibodies are structurally 
complex, and in many cases, changes to 
their amino-acid sequences will not affect 
their function. So a patent based solely on 
an antibody’s sequence might be vulner- 
able to competition, says Barbara Rigby, a 
patent attorney at Dehns in Brighton, UK. 
A competitor could tweak the sequence to 
create a new antibody that performed the 
same function. 

In addition, for many years research- 
ers lacked the technology to sequence an 
antibody, to define how it bound its target 
or to introduce specific changes to its > 
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> structure. Given these challenges, the 
USPTO routinely granted broad patents that 
covered the suite of antibodies that attached 
to a particular target, rather than a specific 
antibody developed by an inventor. 


DEVIL IN THE DETAILS 

Over time, however, the technology for anti- 
body analysis has improved. In 2014, two 
pharmaceutical heavyweights — Amgen in 
Thousand Oaks, California, and Sanofi in 
Paris — launched a battle over patents cover- 
ing a potentially lucrative antibody treatment 
for high cholesterol. 

The case reached a federal appeals court, 
where judges determined last year that inven- 
tors must provide a better description of the 
actual invention — a more defined set of anti- 
bodies — that they wanted to patent. 

The USPTO responded with new guide- 
lines for its examiners this year. Since then, 
patent rejections have piled up. A few weeks 
ago, patent attorney John Kilyk of Leydig, 
Voit & Mayer in Chicago, Illinois, learned that 
an application he was handling had run into 
trouble. “It was sufficient a few months ago, 
and now it’s not,” he says. 

The court ruling is retroactive, so the move 
also jeopardizes existing antibody patents. 
“There’s no doubt that the biotech compa- 
nies that have been patenting antibodies are 


going to be harmed,” says Storz. “There are 
a number of antibody patents that are now 
invalid, or would be if somebody tried to 
enforce them.” 

Universities in particular might struggle to 
put together the information now needed to 
win an antibody patent, says Rodney Sparks, 
an attorney with the University of Virginia's 
technology-transfer 


“There’s no office in Charlottes- 
doubt that ville. Examiners are 
the biotech asking for more detail 
companies about the range of 
that have been antibodies that can 
patenting bind to a target and, 
antibodies are specifically, where on 
going to be the target those anti- 
harmed.” bodies will attach. 


“In universities, 
our guys want to publish,” Sparks says. “We 
don't have the ability, typically, early on to 
make lots and lots and lots of antibodies and 
screen for all of those characteristics.” As a 
result, he says, universities will need to file 
narrower patents covering only a few of the 
possible antibodies, and might struggle to find 
companies willing to license them. 

And applicants are facing pushback 
from patent examiners who are extending 
the tightened rules on an invention’s writ- 
ten description to other kinds of patent 


applications, says Rigby. A broad patent for 
a method to treat disease by targeting a spe- 
cific protein, she says, might now also be in 
question. 

“It’s not clear whether examiners have 
misunderstood and are overreaching, or 
whether this is a more general trend that the 
patent office is behind,’ Rigby says. 

Yet the shift has been an unexpected boon to 
some companies. Benjamin Doranz, president 
of Integral Molecular, a company in Philadel- 
phia, Pennsylvania, that produces and analyses 
antibodies, says that clients used to request 
analyses mainly to learn more about how 
their antibodies functioned. But increasingly, 
he says, the company’s data are being used to 
bolster patent applications. Some of its clients 
are now patent-law firms. 

Patenting antibodies has become much 
more treacherous, says Doranz. “But they’re 
still of great value,” he says, “so everyone is 
trying to figure out the new patent landscape, 
and how do we navigate it.” m 


CORRECTION 

The World View ‘Transparency rule is a 
Trojan Horse’ (Nature 557, 469; 2018) 
misstated the number of signatories to the 
joint statement. It omitted to mention Cell 
Press and PLoS journals. 
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Thousands of Kenyans are taking part in a trial in which they will receive substantial monthly or yearly payments. 


THE ANTI-POVERTY 
EXPERIMENT 


Several projects are testing the idea of doling out a 
‘universal basic income’ that people can use however they want. 


BY CARRIE ARNOLD 


long the shores of Lake 
Victoria in western 
Kenya, mobile phones in 
several hundred villages 
ding in unison on the 
first of every month. For 
more than 21,000 adults, 
the sound means one thing: 2,250 Kenyan 
shillings appearing in their bank accounts. 
The cash equals one-quarter to half of the 
average income for a two-adult household in 
Bomet County, one of the poorest in Kenya. 
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The money (roughly US$22.50) arrives 
courtesy of the US-based charity GiveDirectly, 
which is studying the effects of handing 
people lumps of cash with no strings attached 
— an idea known as a universal basic income 
(UBI). The mobile phones in these villages 
will ding every month for the next 12 years, 
making this UBI trial the longest and largest 
ever conducted. 

“Tt’'s a poverty-alleviation tool. Participants 
can invest in riskier things because they have 
their basic needs taken care of,” says Tavneet 


Suri, an economist at the Abdul Latif Jameel 
Poverty Action Lab at the Massachusetts Insti- 
tute of Technology in Cambridge and one of 
the lead investigators on the Kenya trial. 

The Kenya experiment is one of a handful 
of UBI trials in various stages of development 
around the world. Finland has already begun 
a trial, as has Ontario in Canada. Stockton, 
California, is planning to roll out its own exper- 
iment later this year. Although the concept isn’t 
new — it was first proposed by Enlightenment 
philosophers — it remained a fringe idea until 
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the past few years, and governments are now 
starting to take it more seriously. Interest in 
the idea grew in the aftermath of the 2008 
economic crisis and because of endorsements 
from Silicon Valley tech gurus such as Elon 
Musk. 

Proponents of guaranteed income schemes 
argue that poor people will benefit more from 
unrestricted funds than from current welfare 
systems, which tend to have stringent require- 
ments that often leave recipients trapped in 
poverty. “Universal basic income is about 
giving people cash without question, 
and trusting that they know how to use 
it in the most-effective way they can,” 
says Luke Martinelli, an economist at 
the University of Bath, UK. 

For economists and public-policy 
scholars, the current interest in UBI 
provides an opportunity to conduct 
rigorous trials to determine whether 
it will produce measurable benefits. 

But translating a grand economic theory into 
workable policy is far from easy. Almost all 
trials have involved a small number of people 
or lasted just a few years, which limits their 
power. And there is no clear definition of 
success; researchers try to balance measur- 
ing potential gains in one area, such as health 
care, with potential offsets in another, includ- 
ing education and labour-force participation. 

But for the growing chorus of voices call- 
ing for data-driven policy, trials such as the 
one in Kenya are the only way to see whether 
UBI actually works. “This is one of the first 
rigorous randomized control trials of UBI?” 
says Suri. “This is our chance to understand 
UBI and its impacts.” 


NO-STRINGS CASH 
The modern welfare state emerged out of the 
ashes of the Great Depression and the Sec- 
ond World War. As governments across the 
Americas, Europe and Commonwealth tried 
to rebuild their economies, they began taking 
an active role in providing for the well-being 
of their poorer citizens through grants, ser- 
vices and money earmarked for purposes such 
as housing and food. Although such welfare 
systems have improved standards of living, 
most require an immense bureaucracy to 
administer benefits and to ensure that recipi- 
ents meet strict qualification standards. 
Welfare critics have long argued that the 
administrative costs are huge and provide lim- 
ited positive results; in some cases, they dis- 
courage people from finding jobs. In response, 
leaders across the political spectrum have 
latched onto the idea of UBI — which has been 
promoted over the centuries by luminaries 
such a Thomas More (in his 1516 novel Uto- 
pia), philosopher Thomas Paine, the liberal 
US President Franklin Delano Roosevelt and 
economist Milton Friedman, a favourite of 
conservatives including US President Ronald 
Reagan and UK Prime Minister Margaret 
Thatcher. Progressive politicians and thinkers 


have seen the idea as a way to end poverty; 
conservatives have viewed it a streamlined wel- 
fare system that is easier and cheaper to run. 
In the 1960s and 1970s, a handful of sites 
across the United States tested a scheme 
related to UBI called a negative income tax. In 
this kind of programme, individuals making 
below a certain amount receive supplemental 
money from the government. But after early 
results from one of the trial sites revealed an 
increase in divorce rates, politicians nixed the 
idea as being toxic to the American family. 


“Universal basic income 
is about giving people 
cash without question.” 


Another early test happened across the 
border in the small prairie town of Dauphin, 
Canada. In a trial called Mincome, sup- 
ported by the federal and provincial govern- 
ments, the town’s poorest residents received 
monthly cheques from 1974 to 1978 with 
no constraints on how the money should be 
spent. Researchers tracked changes in the 
proportion of people working full- and part- 
time, as well as in nutrition, education and 
basic health outcomes. But before the trial 
could be analysed, waning funds and politi- 
cal change scrapped the idea, and all the data 
were packed in more than 1,800 boxes and 
stored in a warehouse. They sat there until 
economist Evelyn Forget at the University of 
Manitoba in Winnipeg brushed off layers of 
dust and opened the boxes. 

The documents Forget uncovered revealed 
that teenage children in MINCOME families 
completed an extra year of schooling com- 
pared with teens in similar small Manitoba 
towns. Hospitalizations decreased by 8.5%, 
with the largest drops in admissions for 
accidents and injuries and mental-health 
diagnoses. Importantly for economists, who 
worried that the programme might encourage 
people to quit their jobs, Forget found that 
employment rates stayed the same through- 
out the trial (E. L. Forget Can. Public Policy 
37, 283-305; 2011). 

Now, supporters of UBI in several countries 
are trying to build on the results from those 
earlier studies and develop trials to decide 
whether governments should give UBI a 
chance. 


READY MONEY 

The Kenyan experiment grew out of smaller 
trials that the charity GiveDirectly had done 
in sub-Saharan Africa. Starting in 2009, the 
group tried to alleviate poverty by providing 
relatively modest direct cash transfers. These 
shorter and smaller injections of cash cre- 
ated ripple effects through the communities 
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involved. In a trial in Zimbabwe, one year of 
cash transfers improved childhood vaccination 
rates and school attendance (L. Robertson et 
al. Lancet 381, 1283-1292; 2013). Because the 
transfers were only short-term and too small to 
cover living expenses, they weren't full-blown 
UBI trials. But that early work gave the group 
a leg up on planning a UBI trial, says Michael 
Faye, co-founder and president of GiveDirectly. 

Experts say that full-blown experiments 
are particularly difficult to design because 
they require a great deal of advance planning. 

“With a lot of these projects, the devil is 
in the details, and the design of research 
depends on a fine-grained knowledge of 
its impact,’ says Rob Reich, a political 
scientist at California's Stanford Univer- 
sity, who is not part of these trials. 
Guaranteed-income experiments 
are different from many clinical tri- 
als because researchers are looking for 
improvements in a wide variety of areas 
and doing so across communities rather than in 
individuals. Suri describes a cycle of improve- 
ments that UBI might create: well-fed pregnant 
mothers should have healthier children than 
would undernourished women. Longer edu- 
cation would create better job opportunities 
and delay marriage and childbirth, resulting in 
healthier mothers and babies. Suri says that her 
team plans to track everything from entrepre- 
neurship to health, education and nutritional 
status, with the help ofa platoon of locals who 
will go door-to-door, and do a series of short 
phone check-ins and some in-depth interviews 
with village elders to get a big-picture view of 
the intervention’s effects. 

Because the trial will be so long and expen- 
sive, Suri helped to design four different arms 
to answer as many questions as possible. The 
largest arm provides 2,250 Kenyan shillings 
every month for 2 years to each adult in 80 
villages. A second arm provides the same 
amount of money each month for 12 years. 
A third arm provides a total equivalent to 
US$505 — 2 years’ of basic income — in 2 pay- 
ments, separated by 2 months. The fourth arm 
serves as a control group that gets nothing. 

“We can run a horse race between different 
types of UBI) Suri says. 

Participants in a pilot project that began in 
2016 are enthusiastic about their prospects. 
“This has made me believe that I can commit 
and be able to pay school fees for my children 
and I am also confident of saving money to 
improve my business,’ says Jael. 

A UBI experiment in Finland has been 
struggling. The project grew out of concerns 
that the country’s complicated unemploy- 
ment benefit system keeps some people 
from returning to full-time work because 
that would curtail their support. In March 
2016, the government welfare agency Kela 
teamed up with a non-profit research organi- 
zation Tank to announce a UBI trial that 
would provide €560 (US$658) per month to 
a group of 2,000 adults currently receiving > 
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MONEY FOR NOTHING 


Canada conducted an early test of universal basic income 
in the 1970s. Several governments and charities are now 


experimenting with other schemes. 


MINCOME TRIAL, CANADA 


Payments: Up to Can$19,500 per year (2014 equivalent dollars) 
1 The Mincome trial in the 1970s found numerous benefits, 
including teenagers staying in school for an extra year. 


ONTARIO, CANADA 


Payments: Up to Can$16,989 per year for individuals 
2 The Ontario trial has clearer experimental goals and more rigorous 


data collection than the Mincome trial. 


STOCKTON, CALIFORNIA 
3 Payments: US$500 per month 


This planned trial is very small, but economic problems in this city 
are seen to mirror those facing the United States. 


FINLAND 
4 Payments: €560 per month 


Critics argue that payments were small and that the applicability of 
the trial is limited. Finland terminated the programme after 2 years. 


KENYA 


Payments: 2,250 Kenyan shillings per month 
5 This is the largest and longest such trial in history. Payments 
equate to one-quarter to one-half of average income. 


DURATION OF TRIALS 


Stockton HZ 12-18 months 
Finland EE 2 years 
Ontario NE 3 years 


SIZE OF TRIALS 


4,000 


Payments last for 2 years 


in some villages and for 
12 years in others. 


Mincome i” 6 years 
Kenya i a 12 years 


> unemployment benefits around the 
country. The extra income would not be taxed 
at the same rate as normal unemployment 
benefits, nor would participants be required 
to actively search for work during the two-year 
trial. They also wouldn't lose the UBI income 
if they found work. 

Although global media initially praised the 
programme, popular opinion later soured 
on the scheme, which cost €20 million. The 
monthly payments were nowhere near enough 
to cover an adult's basic living expenses, and 
the trial addressed only adults who were 
unemployed at the time. Plus, there was no 
adequate control group. In late April 2018, 
the Finnish parliament refused requests from 
Kela for another year of funding and instead 
expressed preferences towards developing 
other welfare schemes. To UBI advocates, the 
problems with the Finnish programme have 
proved a serious barrier to getting other trials 
up and running. 

“People’s expectations were far higher than 
the trial could deliver,’ says Karl Widerquist, a 
political economist at Georgetown University 
in Qatar and co-chair of the Basic Income 
Earth Network, which promotes UBI. 
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Other researchers have started a range of 
UBI trials and precursor projects (See ‘Money 
for nothing’). The city of Stockton has begun 
an experiment that is underwritten by phil- 
anthropic organizations. The trial will be 
smaller than the Kenya trial — including only 
100 people for 12-18 months — owing to 
funding limitations, says Taylor Jo Isenberg, 
managing director at the Economic Security 
Project, who is helping to fund the experiment. 
But the effort will provide good preliminary 
data for later trials, she says, because Stockton 
is a microcosm of the United States in terms 
of diversity, poverty levels and job loss from 
automation and outsourcing. 

“Updating previous research on uncondi- 
tional cash transfers is really important but 
really expensive. We hope to open the door 
for other stakeholders to step in later,’ Isenberg 
says. 

The Canadian province of Ontario is also 
hopping on the UBI bandwagon. Its trial 
began late last year, enrolling more than 4,000 
individuals across the province. As spectators 
watch for signs of success and failure in these 
trials, the researchers involved need to define 
what, exactly, ‘success or ‘failure’ means. Given 


that some of the impacts of UBI wont be felt 
for 5-10 years, onlookers might be waiting for 
quite some time. 

The announcement in April that the Finn- 
ish UBI trial wouldn't be funded beyond this 
year provided a sobering reminder that politics 
— more than data — will determine the fate 
of such programmes. The government pulled 
the plug before Markus Kanerva, managing 
director at Tank, and his colleagues had exam- 
ined the data to see how well the trial worked, 
something Kanerva says his team is waiting to 
do until late 2018 so as not to bias the results. 

The outcome of all these trials is far from 
clear, not least because the Kenya project — the 
most ambitious — only just began. 


PROBLEMS OF SCALE 

Over time, the trials could generate data on 
the costs and benefits of UBI schemes, such 
as whether the initiatives reduce health-care 
expenditures. But Martinelli thinks that the 
data will show that it will cost too much to 
make a programme effective. “An affordable 
UBI is inadequate, and an adequate UBI is 
unaffordable,’ he says. 

But even a clear win in these trials won't 
necessarily indicate that UBI would work in 
practice, says economist Damon Jones at the 
University of Chicago in Illinois. Because they 
are relatively small and most of the funding 
comes from private sources, the trials won't 
provide a sense of whether governments could 
afford a big public programme or whether 
citizens would be willing to fork out extra taxes 
to fund them. “Medicine can be scaled up, but 
this isn’t as easy,’ says Jones. A new cancer drug 
might extend lifespan by 3 months, which stays 
true whether 10 people take the drug or 10,000. 
In a UBI trial, 10 people receiving cash will 
have a very different impact on a community 
compared with 10,000. 

Jones cautions that this doesn’t mean the UBI 
trials shouldn't be done or that they will pro- 
duce meaningless data, just that even the best- 
designed studies have inherent limitations. 

Regardless of the outcomes, the trials will 
have an ongoing impact because they can 
identify potential flaws in the process, help 
researchers refine the questions they ask and 
give policymakers some of the answers they 
crave. If the trials succeed, “it wouldn't just be 
an outlier in social policy, it would be a minor 
miracle’, Reich says. 

For the participants of the Kenya trial, 
that minor miracle has already arrived. The 
knowledge that GiveDirectly will deposit 
funds into their accounts every month for 
more than a decade has already begun to shift 
how some of them think about money. Each 
text alert means a chance to invest in their own 
lives or their businesses with the security that 
they can still put food on the table. And that, 
they say, is priceless. m 


Carrie Arnold is a freelance journalist in 
Richmond, Virginia. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


ERIC RUDD/IU COMMUNICATIONS. 


COMMENT 


-NCE The case for HISTORY Why Aldous Huxley’s FUTURE OF WORK Microbes might OBITUARY Peter Grtinberg, 
inflammation as a cause of post-nuclear dystopia end up Gling the jobs that Nobel prizewinner for data- 
depression p.633 feels newly relevant p.634 robots leave us p.637 storage nanotechnology p.638 


Michelle Lollie is an American Physical Society Bridge Fellow at Indiana University in Bloomington. 


Making physics 
more inclusive 


Theodore Hodapp and Erika Brown explain how the 
American Physical Society is helping to recruit and 
retain PhD students from under-represented minorities. 


frican Americans, Hispanic 

Americans and Native Americans 

make up about one-third of univer- 
sity-age citizens in the United States. Yet less 
than 11% of bachelor’s degrees in physics 
are awarded to people from these groups. 
At the doctoral level it is even worse, with 
only about 7% of physics PhDs granted to 
US citizens from racial and ethnic minor- 
ity groups — just 60-70 students each year. 
This is one of the lowest rates in the sciences. 
Chemistry, by comparison, awards 17% of 
bachelor’s and 11% of doctoral degrees to 
these groups (see ‘Doctoral dearth’). The 
proportion in physics has barely risen over 
the past 15 years, while the percentage of US 
university-age students from minorities has 
grown by 18%. 

This is morally questionable and 
disastrous from a practical point of view. 
The discipline of physics, and society as a 
whole, are missing out on talent. Students 
are often judged on the prestige of their 
undergraduate institution or the prepara- 
tion they received at school, rather than on 
what really matters: their aptitude, drive 
and ingenuity. 

Physicists cannot fix all of society’s ills, 
but the community can and must provide 
more equitable pathways into research. This 
does not mean lowering the bar, but showing 
students where it is and helping them to find 
their way over it. 

For the past five years, the American 
Physical Society (APS) has been taking 
the first steps by working with physics 
departments across the United States to 
balance the doctoral and bachelor’s gradu- 
ation rates for under-represented students. 
Given that the numbers of students are 
small, interventions at a limited number 
of universities can drastically change the 
landscape. To effect this change, the APS 
has directed resources to overcoming 
admissions barriers and ensuring that 
graduate programmes where students are 
admitted have adequate support to help 
them remain on track. These support 
structures benefit all students. 

The APS Bridge Program’ (funded in part 
by the US National Science Foundation) 
asks physics faculty members to consider 
and recruit graduate students from under- 
represented minorities whom they think 
would do well in a doctoral programme 
but who, for whatever reasons, have not > 
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> been accepted. Such recommendations 
are permitted, although it is illegal in the 
United States to specify race or ethnicity in 
university admissions procedures as the sole 
criterion for a decision. 

After the standard mid-April cut-off for 
informing students of their acceptance into 
US graduate programmes, the APS collects 
applications from Bridge Program candi- 
dates and circulates them to institutions. The 
institutions take another look and select the 
students who are best for them. The depart- 
ments are required to mentor and monitor 
the progress of Bridge Program students. 
More than 35 US institutions are now 
working with the APS. 

There are currently around 150 students 
in the Bridge Program. In 2017, by accept- 
ing 46 students in one year, departments 
more than compensated for the difference 
between the doctoral and bachelor’s gradu- 
ation rates (see ‘Bridging the gap’). When the 
APS began the programme in 2012, it gave 
grants to universities to support most Bridge 
students. Now, most students are funded by 
the physics community; in 2017-18, the APS 
supported only six. 

We found no single root cause for why 
under-represented students were not 
accepted into graduate programmes in phys- 
ics. The problems were mostly systemic and 
circumstantial, not the fault of the students. 
Some students told us that they were unable 
financially to apply to more than a few pro- 
grammes, or that they were discouraged by 
perceived and real biases in application pro- 
cedures. Other factors included inadequate 
mentoring and preparation for research 
careers at the student’s undergraduate insti- 
tution. These hindrances are relatively easy 
to overcome. 

Here we discuss what we've tried, what we 
have found to work and what still needs to 
be explored. 


GRADUATE ADMISSIONS 

The first hurdle is the graduate admissions 
process. It is a well-guarded door lying 
between a student and a research career. 
Committees must contend with hundreds 
of applications and an incomplete picture 
of each student. Candidates with high 
scores in undergraduate mathematics and 
physics courses or entrance exams pass 
through the door easily, including some 
students from minority groups. Appli- 
cants who have mixed academic records 
can benefit from further consideration by 
admissions committees. 

Behind each CV is a story. What if you 
went to a substandard middle or high 
school, where your peers barely made it 
through algebra and the teacher taught far 
below your potential? What if you had to 
find a full-time job to finance your uni- 
versity education, leaving little time to 
study, much less excel? Some students in 


630 | NATURE | VOL 557 | 31 MAY 2018 


DOCTORAL DEARTH 


In all disciplines across the sciences, the proportion of US citizens from under-represented minorities 
graduating with bachelor’s degrees is low; the proportion completing PhDs is even lower. 


Geosciences 

Physics and astronomy 
Mathematics and statistics 
Engineering 

Chemistry 

Biological sciences 


Computer science 


our programme experienced these situ- 
ations. Remedies were as simple as extra 
coursework to compensate for inadequate 
preparation, a graduate stipend to provide 
financial stability, or a committee that was 
able to see past one poor mark to recognize 
potential. 

In our experience, the biggest barrier 
to students getting into a physics doctoral 
programme is the Graduate Record Exami- 
nation (GRE), a standardized test required 
for admission into most graduate schools in 
the United States. More than one-third of 
US graduate physics programmes will con- 
sider only candi- 


it3 
dates whose scores Issues 
in the physics unrelated to 
GRE test (P-GRE) “@cademic 
exceed acut-off?, “bility can 
This ignores the affect or destroy 
larger pictureofa &8) aduate 
student’s develop- student’s 
mentandalso goes potential to stay 


against the advice the distance.” 

of the Educational 

Testing Service (ETS), which produces 
the GRE. The ETS recommends that GRE 
scores should never be the sole basis for an 
admissions decision and should be weighed 
against other factors’. 

P-GRE scores conflate many things. 
Students need to prepare carefully because 
the scope and approach of the test are dif- 
ferent from how most undergraduates are 
taught and evaluated. In addition, many 
undergraduate institutions offer no tools 
or guidance to help students to prepare. It 
costs US$150 to take the test. Despite the 
best efforts of the ETS, the GRE tests suffer 
from biases resulting from students’ soci- 
etal experiences and expectations. Women 
and people from minority racial and eth- 
nic groups score lower than do white or 
Asian men, on average*. Candidates are 


MPhD Mi Bachelor's degree | 


6.6% Admissions processes 
: 0 have put off applicants 
: 10.1% for PhDs in physics. 


10 1S 20 


Students who graduated (%) 


influenced by ‘stereotype threat’: members 
of groups for which stereotypical expecta- 
tions are low perform worse in high-stakes 
exams when they are reminded that they 
are part of that group (see, for example, 
ref. 5). These factors, and a student's apti- 
tude for taking this type of test — or even 
how well they were feeling on the day — 
matter. 

Scientists should care most about poten- 
tial, not preparation. Even if admissions 
committees downplay the value of the GRE, 
students do not. Those with low scores are 
discouraged from applying to institutions 
that publish high average scores. 

The question then remains, how should 
admissions committees pick graduate 
students? 

This is both a philosophical and a practi- 
cal concern: what are committees’ goals in 
selecting a student, and how should they sort 
through a big pile of applications in a small 
amount of time? 

Philosophically, should committees try 
to identify the student who is already at the 
top of the applications pile, itself defined 
in part by systemic biases? Or should they 
try to spot someone who can develop to 
become an excellent researcher? The latter 
mindset® accommodates individuals who 
might have grown up in places with few 
educational and mentoring resources avail- 
able, but who have a passion and aptitude 
for physics. Members of the physics com- 
munity should provide an opportunity for 
such individuals, irrespective of their social 
background. 

Practically, the APS works with depart- 
ments that are trying a variety of ways to 
select students. It’s too soon to tell how these 
strategies can be generalized. Each depart- 
ment has different needs and must find a 
technique that works for it. Some review 
all applications from target groups to find 
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APS Bridge Fellow Joseph (JB) Holmes is studying biological physics at Indiana University, Bloomington. 


compelling stories that indicate promise. 
Others shortlist potentially good candidates 
on paper and conduct short, 15-30-minute 
video interviews with each. These explore 
traits that are correlated with success, such 
as problem solving, tenacity and the abil- 
ity to assess your own weaknesses (see, for 
example, ref. 7). 


STUDENT SUPPORT 

The next step is to help PhD students to 
finish their doctorates. Mentoring and peer 
support are crucial. 

All graduate students face challenges. Ina 
2008 study, only 59% of US doctoral students 
in physics completed their PhDs*. As well as 
missing out on talent, it is expensive to lose 
graduate students. Each requires upwards of 
$300,000 of direct support during their stud- 
ies, as well as resources, facilities and faculty 
members’ time. Students are committing 
years of their life towards the long-term goal 
of engaging with the physics community. 

The Bridge Program, by contrast, has an 
average retention rate of around 85%. How 
have institutions done it? 

Interviews with Bridge students and their 
mentors have revealed that numerous practi- 
cal issues unrelated to academic ability can 
affect or destroy a graduate student's poten- 
tial to stay the distance. Examples include: 
living too far from campus to join in study 
or research sessions; being inexperienced 
in managing money; family commitments 


and dynamics; feelings of isolation; or poor 
advice on how to navigate the university sys- 
tem. Poverty exacerbates all these problems. 

Several mentors are preferable, includ- 
ing a research adviser, an academic adviser 
and someone whom the student feels has 
no power over them, such as a staff mem- 
ber. Bridge students check in with their 


BRIDGING THE GAP 


In its fourth year, the American Physical Society's 
Bridge Program admitted enough students to 
erase the difference between graduation rates for 
bachelor’s and PhD degrees in physics. 


mPlaced Left the Bridge Program 
50 


Number needed to 

balance doctoral 
40 -- and bachelor’s 

graduation rates. 


Number of PhD students 


10 


2013 2015 2017 


mentors at least once every couple of weeks 
during the first year so mentors can make 
sure they are adjusting well. Meetings can 
taper off as students find their groove. But 
it is important that mentors intervene early 
when problems arise, such as illness, per- 
sonal issues or courses that are pitched at 
an inappropriate level. 

We have found the first six weeks to be 
crucial. Changes to a student's academic plan 
after this come too late — students facing 
obstacles already feel that pursuing gradu- 
ate education was a bad idea; isolation has 
set in. They might already be well down a 
downward spiral that leads to dropping out. 

Peer support is crucial, too. Institutions 
involved with the Bridge Program either 
had or have developed a physics graduate 
student association. These work on behalf of 
all students, but their activities can be pivotal 
for students from diverse backgrounds 
who are feeling isolated. The student asso- 
ciations assign more-senior students as 
mentors to new participants in the Bridge 
Program, hold social functions to welcome 
all students, and provide a space for them 
to share experiences and knowledge. Some 
hold student-only seminars — at which no 
faculty members are allowed — on careers, 
courses and campus life, providing a place 
to vent and learn. Representatives of these 
organizations can be a ‘student voice’ in 
conversations with the faculty. 

Problems can and do often occur late 
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Physics PhD student Keanna Jardine is investigating the dust dynamics of small asteroids at the University of Central Florida, Orlando. 


in a student's studies as they navigate the 
research and dissertation phases. In many 
universities, a committee meets annually 
to review the progress of each graduate stu- 
dent. Ideally, the chair of such a committee 
should be someone other than the student’s 
research adviser, in case that relationship 
sours. Faculty members might need to 
devote extra mentoring time at this stage to 
ensure the student finishes their work and 
thesis. 

The APS tracks all students in the Bridge 
Program. Along with academic transcripts, 
we ask mentors to evaluate each student's 
progress towards 


aPhD. The proof “The world 
isin the retention cannot afford 
rate: currently,85% to waste talent.” 


of our students are 

on track — significantly more than the 
national average. Students report that the 
programme gave them the chance they 
needed to pursue graduate studies. 

Our first students are likely to receive 
their PhDs in 2019. They will then start 
looking for postdoctoral jobs. The APS has 
begun to collaborate with national labora- 
tories in the United States — collectively 
the largest employer of physicists outside 
academia — to help match up Bridge Pro- 
gram graduates with job posts. We are also 
developing a mentoring curriculum for the 
researchers who sponsor these graduates, 
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to help make them more aware of diversity 
issues. 


NEXT STEPS 
To make the physics community more 
representative, we recommend three actions. 

First, graduate departments should aim 
to reflect the racial, ethnic and gender mix 
of the undergraduate population pool, at 
a minimum. They should use admissions 
techniques that look beyond conventional 
measures, to identify students who can be 
successful leaders in the future’. Admis- 
sions committees must educate themselves 
on what the P-GRE is actually measuring, 
rather than what they think it is measuring. 

Second, graduate departments should 
foster more-supportive cultures for all 
students. Departments should offer 
undergraduate coursework where needed, 
mentor students throughout their studies 
— especially in the first few semesters — 
and formalize mentoring by peers. 

Third, we encourage other national organi- 
zations, such as the American Chemical 
Society, the American Geophysical Union 
and their equivalents in other countries to 
take a similar intermediary role. We have 
begun discussions with some of these and 
received enthusiastic responses. Moreover, 
similar interventions could reduce gen- 
der disparities in disciplines in which the 
percentage of women changes appreciably 


between undergraduate and graduate stages 
(in physics it does not). 

We must embrace diversity within the 
physics community. The world cannot afford 
to waste talent. m 


Theodore Hodapp is director of project 
development and senior adviser to the 
Education and Diversity Department at the 
American Physical Society, College Park, 
Maryland, USA. Erika Brown is Bridge 
Program manager at the American Physical 
Society. 

e-mail: hodapp@aps.org 
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Depression revisite 


BOOKS & ARTS 


Alison Abbott considers a persuasive case that inflammation is linked to the disorder. 


epression affects one in four people 
D at some time in their lives. It is often 

difficult to treat, in part because its 
causes are still debated. Psychiatrist Edward 
Bullmore is an ardent proponent ofa radical 
theory now gaining traction: that inflamma- 
tion in the brain may underlie some instances. 
His succinct, broad-brush study, The Inflamed 
Mind, looks at the mounting evidence. 

The book outlines a persuasive case for 
the link between brain inflammation and 
depression. Bullmore pleads with the medi- 
cal profession to open its collective mind, 
and the pharmaceutical industry to open its 
research budget, to the idea. He provides a 
current perspective on how the science of 
psychiatry is slowly emerging from a decades- 
long torpor. He sees the start of a shift in the 
Cartesian view that disorders of the body 
‘belong’ to physicians, whereas those of the 
more ‘immaterial’ mind ‘belong’ to psychia- 
trists. Accepting that some cases of depression 
result from infections and other inflamma- 
tion-causing disorders of the body could lead 
to much-needed new treatments, he argues. 

In 1989, during his clinical training at 
St Bartholomew’s Hospital in London, 
Bullmore encountered a patient whom he 


calls Mrs P, who had 
severe rheumatoid 
arthritis. She left an 
indelible impression. 
He examined her 
physically and probed 
her general state of 
mind. He reported 
to his senior physi- 
cian, with a certain 
pride in his diagnostic 
skill, that Mrs P was 
both arthritic and 
depressed. Replied the 
experienced rheuma- 
tologist dismissively, 
given her painful, incurable physical condi- 
tion, “You would be, wouldn't you?” 

Mrs P is a recurring motif, as is the rhe- 
torical question. Bullmore draws on more 
than two millennia of medical history — 
from ancient Greek physician Hippocrates 
to the work of neuroanatomist and 1906 
Nobel laureate Santiago Ramon y Cajal — 
to illustrate his points. At times they seem 
like intellectual meanderings, but these 
passages also show how medical science 
often progresses by means of bold theories 


The Inflamed 
Mind: A Radical 
New Approach to 


ression 


NARD BULLMORE 


Short (2018) 


that break away from received wisdom. 

After his training, Bullmore specialized in 
psychiatry, and quickly experienced its limi- 
tations. He describes his growing awareness 
of how poorly science has served the field, 
using the development of selective seroto- 
nin reuptake inhibitors (SSRIs) as a prime 
example. 

That long and winding road began with 
the antibiotic iproniazid. It was discovered 
through scientific logic: by screening chem- 
icals for their ability to kill Mycobacterium 
tuberculosis in the test tube and in mice. 
Iproniazid transformed the treatment of 
tuberculosis in the 1950s. Patients clawed 
back from the jaws of death exhibited 
euphoria — well, you would, wouldn't you? 
— and the drug was soon launched as an 
antidepressant. Soon the theory emerged 
(based more on supposition than evidence, 
says Bullmore) that its psychiatric effects 
were the result of boosting the neurotrans- 
mitters adrenaline and noradrenaline. Drug 
developers began to focus on neurotransmis- 
sion more broadly. 

Prozac (fluoxetine), which boosts sero- 
tonin transmission, was launched in the 
mid-1980s, and many pharmaceutical 
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> companies quickly followed with their 
own SSRIs. It seemed to be the revolution 
psychiatrists had been waiting for. But it 
soon emerged that only a modest subset 
of patients benefited (estimates based on 
trials vary widely). That is unsurprising in 
retrospect, with the new appreciation that 
depression can have many causes. Bull- 
more holds that the emergence of SSRIs 
bypassed scientific logic. The serotonin 
theory, he writes, is as “unsatisfactory as 
the Freudian theory of unquantifiable 
libido or the Hippocratic theory of non- 
existent black bile”. He notes that, after 
SSRIs failed to live up to the hype, time 
once again stood still for psychiatry. 

Bullmore recalls a teleconference in 
2010, when he was working part-time 
with British 


pharmaceutical “After SSRIs 

et A failed to live up 

During the call, to the hyp e, time 
’ stood still for 


the company 
announced it 
was pulling out 
of psychiatry research because no new 
ideas were emerging. In the following 
years, almost all of ‘big pharma’ aban- 
doned mental health. 

Then a window seemed to open — one 
that shed a different light on the plight of 
Mrs P. Some of the textbook certainty that 
Bullmore had learnt by rote at medical 
school started to look distinctly uncertain. 

In particular, the blood-brain barrier 
turned out to be less impenetrable than 
assumed. A range of research showed that 
proteins in the body could reach the brain. 
These included inflammatory proteins 
called cytokines that were churned out in 
times of infection by immune cells called 
macrophages. Bullmore pulls together 
evidence that this echo of inflammation 
in the brain can be linked to depression. 
That, he argues, should inspire pharma- 
ceutical companies to return to psychiatry. 

It seems unfair that someone struck 
down by infection should have depression 
too. Is there a feasible evolutionary expla- 
nation? Bullmore hazards that depression 
would discourage ill individuals from 
socializing and spreading an infection that 
might otherwise wipe out a tribe. 

Other brain disorders might turn out to 
be prompted or promoted by inflamma- 
tion. An exciting link with neurodegen- 
erative diseases, including Alzheimer’s, 
is also being studied (see Nature 556, 
426-428; 2018). But we need to learn 
from the rollercoaster history of brain 
research, and keep expectations in check. 
Beneath his bombastic enthusiasm, Bull- 
more acknowledges this, too. m 


psychiatry.” 


Alison Abbott is Nature’ senior 
European correspondent. 
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IN RETROSPECT 


Ape and Essence 


Richard Rhodes finds resonance with today’s uneasy 
nuclear age in Aldous Huxley’s satirical dystopia. 


r | 1 he atomic bombings of Hiroshima and 
Nagasaki in August 1945 shocked the 
world. More than 70 years on, these 

events have not been repeated — evidence 

that it was the United States’ temporary 
nuclear monopoly that made them possible. 

Yet few observers in the immediate postwar 

years foresaw the development of an uneasy 

nuclear truce, enforced by the certainty of 
mutual destruction. Fears of an arms race 
culminating in nuclear war were widespread. 

Such fears have now re-emerged, with North 

Korea's burgeoning arsenal and the United 

States’ abrogation of its agreement with Iran. 

J. Robert Oppenheimer, who directed the 
development of the first bombs, was among 
those who feared nuclear conflict, and worked 
for international control after the Second 

World War. Physicist Richard Feynman 

(whom I interviewed for my 1987 book The 

Making of the Atomic Bomb) recalled sitting 

in a bar in New York in 1946, watching the 

crowd passing outside and thinking: “You 
poor fools, you have no idea that in a few 
more years you ll all be dead.” Aldous Huxley 
seems to have leapt to the same conclusion in 
his hybrid novel and film scenario Ape and 
Essence, published 70 years ago. 


Aldous Huxley in the 1950s. 


Ape and Essence 
ALDOUS HUXLEY 
Harper & Brothers 
(1948) 


The prolific novelist 
and essayist had 
been formulating his 
thoughts on the bomb 
since the end of the 
war. In 1947, Huxley published the extended 
essay ‘Science, Liberty, and Peace’ a prelude 
to the novel. There, he wrote that the power- 
hungry and nationalistic “boy-gangster” in 
us all would easily prevail over the reason- 
able adult, exulting: “Press a few buttons and 
bang! the war to end war will be over, and I 
shall be the boss of the whole planet? Huxley 
knew better. If more than one nation had 
such weaponry, he believed, the outcome of 
“the war to end war” would be world-scale 
destruction. And because that would be a 
kind of singularity, it seemed to him that 
almost anything might follow. 

Ape and Essence is Huxley’s imagining of a 
post-nuclear world. The title is from William 
Shakespeare's Measure for Measure: Isabella 
speaks of the proud man’s “glassy essence, 
like an angry ape’, which “plays such fantas- 
tic tricks before high heaven/As make the 
angels weep.” The angels have flown in Hux- 
ley’s novel, set in what remains of Los Angeles, 
California, in 2108 — a century after a third 
world war, which would have taken place 
around now, in Huxley’s fictional timeline. 

In one of the book’s set pieces, intelligent 
baboons fight this twenty-first-century war, 
with scientific luminaries (Michael Faraday 
and two opposing Albert Einsteins) as leashed 
mascots. So much for scientists, Huxley 
insinuates — “good, well-meaning men, for 
the most part. But... they ceased to be human 
beings and became specialists.” Of the two 
opposing cultures described by scientist and 
novelist C. P. Snow in 1959, Huxley was clearly 
on the side of the humanities, as if locked in 
debate with his distinguished scientific kin — 
his biologist brother Julian, physiologist half- 
brother Andrew and zoologist grandfather 
Thomas Henry, known as Darwin's bulldog. 

Having introduced the baboons, Huxley 
kills them off: it’s a second false start to the 
novel’ stuttering story, a metafictional con- 
coction as multilayered as an onion. The first 
storyline sees two screenwriters tracking 
down a legendary colleague, only to find him 
dead. The deceased’s abandoned screenplay 
(Isuspect one of Huxley’s own unsold efforts, 
repackaged) is the book's centrepiece. It is 
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here that the baboons rise and fall. The script 
then moves on to an improbable love story set 
in the world of an irradiated rump ofhumans 
who survived the war but have forgotten how 
to make things. They live by scavenging left- 
overs from the pre-war days, burning books 
for heat and assigning crews to rob old graves 
of suits and jewellery. Eunuch priests of the 
devil-figure Belial squat at the top of the caste 
system in this stunted world, dominating a 
society of near-slaves. 

Ape and Essence parallels Huxley’s 1932 
Brave New World (see P. Ball Nature 503, 338- 
339; 2013), yet offers an even darker vision. A 
young botanist, Alfred Poole, has arrived by 
ship from New Zealand, which survived the 
atomic war and is now exploring what’ left of 
the world. So, as with Brave New World's Sav- 
age, a Candide-like hero appears from outside 
society and finds himself appalled. And what 
appals both heroes is the indiscriminate sex- 
uality that the society's leaders encourage to 
replace family and human love. 

The twist this time, as Huxley wrote to his 
fellow screenwriter Anita Loos, is that “the 
chief effect of the gamma radiations [has] 
been to produce a race of men and women 
who don't make love all the year round, but 
have a brief mating season”. This manifests 
as mass gropings; any progeny deemed too 
monstrous, the result of radiation-damaged 
genes, are then slaughtered on Belial Eve. 
(Huxley probably knew that Hermann Muller 
had received a Nobel prize in 1946 for the dis- 
covery that X-rays can cause mutations.) The 
ceremony, called the Purification of the Race, 
mimics the blood sacrifices of the Aztecs. It 
also alludes to eugenics, the British-Ameri- 
can pseudoscience embraced by Adolf Hitler. 

What all this sexualized barbarity has to 
do with nuclear war isn't clear. Born in 1894, 
Huxley brought a scolding Victorian sensi- 
bility to the loosened morals of the war-torn 
twentieth century, excoriating its hedonism 
in satires and science fiction. Sun-drenched, 
beauty-obsessed southern California, where 
he lived and worked from 1937 until his 
death in 1963, proved the ideal locale for his 
dystopias: a seeming paradise that was also 
the end of the frontier. 

Appropriately enough, Ape and Essence 
culminates in a Hollywood happy ending, at 
least for Poole and Loola, the young woman 
he falls in love with. The lovers escape to 
northern California, where a colony of 
“hots’— hold-outs with conventional sex- 
uality — are cobbling together a new life. 
However disdainful Huxley might have been 
of our core boy-gangsters, in the end, he was 
too humane for a truly relentless apocalypse: 
his dystopias had escape hatches. Would that 
the same could be said of a real nuclear war. m 


Richard Rhodes’: latest book, Energy: A 
Human History, will be published in the 
United States in late May. 

e-mail: richardrhodes1@comcast.net 
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The X Club 

Ruth Barton UNIVERSITY OF CHICAGO PRESS (2018) 

For decades in the late 1800s, nine scientific luminaries 

(among them biologist Thomas Henry Huxley and botanist 
Joseph Dalton Hooker) dined together as members of the ‘X Club’. 
This socio-economically diverse group, formed in part to promote 
Charles Darwin’s achievements, is a telling case study in the 
dynamics of Victorian class and science. Historian Ruth Barton’s 
magisterial chronicle traces the careers of the “X-men” and their 
agile promotion of science; Huxley, in particular, emerges vividly as 
wily, belligerent, and obstructive to women entering science. 


Fallout 

Fred Pearce BEACON (2018) 

Science writer Fred Pearce casts a cool and measured eye on an 
explosive legacy: the atomic age. Launched by Winston Churchill’s 
nuclear ambitions (realized by the US Manhattan Project), this 

era lingers on in plutonium stockpiles, arsenals and ageing power 
plants. Pearce roams with intent from Sellafield, “Britain’s brooding 
nuclear nightmare”, to radioactive steppes in Kazakhstan, blighted 
by 619 atomic tests in the 1950s. His nuanced conclusion is that, 
together, alarmist protestors and a secretive nuclear industry create a 
different sort of fallout: the spread of disinformation and fear. 


Randomistas 

Andrew Leigh YALE UNIVERSITY PRESS (2018) 

Randomized testing, economist Andrew Leigh reminds us, has 
vanquished scurvy, improved wildfire response — and proved key to 
better feedback loops in medicine and crime prevention. The trove of 
case studies in his insightful study includes the 1960s Perry Preschool 
Project, which exposed the long-term positive impact of early 
education among African American children living in poverty. Leigh 
also explores the work of pioneering ‘randomistas’ such as social- 
policy expert Judith Gueron, and outlines handy guidelines on aspects 
of randomized testing, such as sample splitting and ethical oversight. 


Tasting the Past 

Kevin Begos ALGONQUIN (2018) 

If you can tell Sauvignon blanc from Sémillon, you might feel that 
you ‘know’ wine. Science journalist Kevin Begos blows that idea 

to smithereens. He travelled from the Caucasus Mountains to 
Israel and beyond, and riffled through archives, to unearth ancient 
‘founder’ grape varieties. En route, he consults archaeobiologist 
Patrick McGovern and grape geneticist Shivi Drori; reads papers 
on the DNA of “wild yeasts that live symbiotically with wasps”; and 
contemplates the oldest grape fossil found. A book that froths with 
data on half-forgotten vines, from Hamdani to Gros Manseng. 


The Waterless Sea: A Curious History of Mirages 

Christopher Pinney REAKTION (2018) 

The illusory seas observed in sere deserts are not the only form 

of mirage, notes Christopher Pinney in this alluring tour of the 
phenomenon in science and culture. Created by light refracting as 
it moves through atmospheric regions with differing temperatures, 
mirages can also appear as imposing and mysterious ‘castles in 
the air’. Pinney ranges from the old Japanese belief that these 
“phantom paradises” were exhaled by clam monsters, to an 1898 
Nature report detailing mirage effects on flagstone pavements. 

A paean to a sublime apparition, “real, but not true”. Barbara Kiser 
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Women’s prize: be 
more generous 


Your announcement of awards 
to celebrate women in science 
(Nature 556, 150; 2018) recalls 
another such prize announced 
more than 100 years ago (see 
Science 28, 832; 1908). 

The Sarah Berliner Research 
Fellowship for Women came 
about in 1909 thanks to the efforts 
of mathematician Christine Ladd- 
Franklin, who completed all the 
requirements for a PhD at Johns 
Hopkins University in Baltimore, 
Maryland, in 1882 but did not 
receive her doctorate until 1926. 
(Similar to many US universities 
at the time, Johns Hopkins did not 
award PhDs to women.) 

Ladd-Franklin convinced 
Emile Berliner — inventor of the 
gramophone, the flat-disc record 
and a type of microphone used 
in the first practical telephones 
— to endowa fellowship for 
female scientists in the name of 
his mother. 

The fellowship enabled 
female scientists with a PhD to 
spend one year doing research 
at a US university. The stipend 
was US$1,200 at a time when 
the average salary of “assistant 
professors in the leading 
universities” was $1,800 (Pop. 
Sci. Mon. 76, 615; 1910). The 
fellowship, today administered 
by the American Association of 
University Women, is now worth 
$30,000, matching the estimated 
25-fold increase in prices since 
1910. 

It might be argued on these 
grounds that the Nature awards 
should be doubled from the 
stipulated total of $15,200. 
Richard P. Elinson Huntington, 
New York, USA. 
elinson@duq.edu 


Women’s prize: act 
to boost all diversity 


We applaud your announcement 
to further promote gender 
equality through your Inspiring 
Science and Innovating Science 
awards (Nature 556, 150; 2018). 
More such initiatives are sorely 


needed to address the severe 
under-representation of people 
from minority racial and ethnic 
groups in science, technology, 
engineering, mathematics and 
medicine (STEMM). 

These inequalities have arisen 
from much the same social and 
historical drivers that have made 
the contemporary STEMM 
sphere biased in favour of men 
(N. A. Fouad and M. C. Santana 


J. Career Assess. 25, 24-39; 2017). 


Barriers to equality — such as 
sexism, poverty and racism — 
compound one another, making 
it even harder for members of 
multiple minority groups to 
pursue an academic career (see 
Nature 547, 266-267; 2017). 
Overcoming the planet's 
unprecedented challenges will 
demand all of our combined 
intellectual power — regardless 
of gender, race, ethnicity, 
sexuality, disability or any other 
diversity dimension that is 
currently under-represented in 
STEMM. 
Ricardo Rocha, Fangyuan Hua 
University of Cambridge, UK. 
1r552@cam.ac.uk 


India’s push for solar 
geoengineering 


India has been contributing 

to the evaluation, discussion 
and implementation of solar- 
geoengineering research for 
almost a decade, in line with 
the call by A. Atiq Rahman 

and colleagues for developing 
countries to take the lead in this 
realm (see Nature 556, 22-24; 
2018). 

The Indian government's 
Department of Science and 
Technology launched a 
major research initiative in 
2017 at the Indian Institute 
of Science in Bangalore to 
understand the implications 
of solar geoengineering on 
developing countries. The first 
annual meeting of experts and 
policymakers to discuss how this 
research could be done in India 
was held in 2017. 

The department has also 
funded a geoengineering 


climate-modelling research 
programme over the past 
five years. This has revealed, 
for example, how solar 
geoengineering could affect the 
global water cycle and extreme 
events and cyclones in the Bay of 
Bengal (see G. Bala and B. Nag 
Clim. Dyn. 39, 1527-1542; 2012; 
and A. Nalam et al. Clim. Dyn. 
50, 3375-3395; 2018). 
Furthermore, New Delhi's 
Council on Energy, Environment 
and Water has held three 
international conferences since 
2011 to identify India’s role 
in developing regional and 
global governance of solar- 
geoengineering research and 
technologies. 
Govindasamy Bala Indian 
Institute of Science, Bangalore, 
India. 
Akhilesh Gupta Department 
of Science and Technology, 
Government of India, New Delhi, 
India. 
gbala@iisc.ac.in 


Cooperate on 
research integrity 


Asan institutional research- 
integrity officer, I see first-hand 
how cooperation between 
journals and institutions is 
crucial for addressing research 
misconduct. As both camps 
consider utilizing the latest tools 
for detecting image duplication 
(see, for example, Nature 555, 
18; 2018), it is important that 
they work closely together to 
deal with uncovered issues. 

Last year’s CLUE 
Recommendations put forward 
best practices for cooperation 
between editors and institutions 
to ensure research integrity 
and to protect the scientific 
record (E. Wager et al. 

Preprint at bioRxiv https://doi. 
org/10.1101/139170; 2017). For 
example, institutions need to be 
more willing to share information 
with journals, including research- 
misconduct reports. They should 
also consider asking journals to 
correct or retract publications 

as soon as data are known to 

be false, rather than waiting for 


lengthy misconduct processes to 
be completed. And when research 
misconduct is suspected, journals 
should consider contacting 
institutions directly so that raw 
data can be properly secured. 
Adopting best practices and 
cultivating strong partnerships 
are in everyone's best interests. 
Lauran Qualkenbush 
Northwestern University, Chicago, 
Illinois, USA. 
Ihaney@northwestern.edu 


Microbes set to alter 
the economy 


Gene editing stands to accelerate 
the engineering of microbes for 
industrial production of food 
ingredients, pharmaceuticals, 
biofuels and biomaterials. There 
is arisk, however, that microbial 
biotechnologies could destabilize 
economies and employment in 
the developing world that depend 
on supplying naturally occurring 
ingredients. For example, a 
biosynthetic process for making a 
precursor of the antimalarial drug 
artemisinin has been developed, 
which could threaten the jobs of 
farmers who harvest its natural 
source, the plant Artemisia annua. 
Microbial processes hold 
promise for global sustainable 
development: they are cheaper, 
consume less energy and 
pollute less than oil-based 
manufacturing, and they 
use renewable feedstocks 
(V. de Lorenzo et al. EMBO 
Rep. 19, e45658; 2018). Yet it is 
imperative that international 
stakeholders assess and address 
any social-justice problems 
that could arise from such 
applications (see C. G. Acevedo- 
Rocha in Ambivalences of 
Creating Life 9-53; Springer, 
2016). Long-term commitment 
will be necessary to close the 
communication gap between 
scholars from different 
disciplines, cultures, values and 
generations (see also S. Jasanoff 
and J. B. Hurlbut Nature 555, 
435-437; 2018). 
Carlos G. Acevedo-Rocha 
Biosyntia, Copenhagen, Denmark. 
car@biosyntia.com 
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Peter Grtinberg 


(1939-2018) 


Physicist who revolutionized data storage with work on magnetism in nanomaterials. 


eter Griinberg was one of the first 
Poses to understand the potential 

of nascent nanotechnologies for fun- 
damental research. He discovered giant 
magnetoresistance, or GMR: a large change in 
electrical resistance induced by a small mag- 
netic field in stacks of ultrathin magnetic and 
non-magnetic layers. For this, he won a share 
of the 2007 Nobel Prize in Physics (as did I; 
we independently discovered the same effect). 
Ultimately, his work led to the development 
of hard-disk drives and greatly increased data 
storage. It also kicked off the field of spintron- 
ics. Grinberg died on 7 April 2018, aged 78. 

Grinberg was born in 1939 in Pilsen, 
Bohemia, then a German protectorate, now 
part of the Czech Republic. In 1945, his family 
left for West Germany. There, at 19, Griinberg 
went to study physics at the Goethe University 
Frankfurt; he then dida PhD at the Technical 
University of Darmstadt. 

For his PhD, he used optical spectroscopy 
to determine the energy levels of rare-earth 
ions in magnetic garnet crystals. In his post- 
doc, he turned another spectroscopy tech- 
nique — Raman scattering — on garnets, 
this time at Carleton University in Ottawa, 
Canada, from 1969 to 1971. And in 1972, 
thanks to his expertise in the spectroscopic 
study of magnetic materials, Griinberg was 
offered a post at the newly founded Institute 
for Magnetism at the Jiilich Research Centre 
in Germany. 

Here, Griinberg quickly demonstrated his 
pioneering spirit, developing the spectros- 
copy technique of Brillouin light-scattering 
spectroscopy (BLS). BLS examines the inelas- 
tic scattering of light; it can probe both the 
ground state of magnetic materials and their 
excited states. In the 1970s, physicists were 
struggling to pick up the specific excitation 
modes expected to occur at the surface of 
magnetic materials. Grinberg singled out 
these modes, and identified them as spin 
waves of the Damon-Eshbach type. 

During a sabbatical at Argonne National 
Laboratory in Illinois in 1985, Grinberg 
used an emerging technique of growing 
metals on single crystals to extend his BLS 
experiments to layers of magnetic materi- 
als less than 1 nanometre thick. This led to 
his first major discovery. In a sandwich of 
magnetic iron, non-magnetic chromium 
and more iron, he and his co-workers dem- 
onstrated the existence of antiferromag- 
netic exchange coupling between the iron 
layers across chromium (P. Griinberg et al. 
Phys. Rev. Lett. 57, 2442; 1986). This was the 
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first demonstration of a quantum effect in 
magnetism. The coupling results from the 
interference between electronic wave func- 
tions reflected at the surface of the magnetic 
layers. For me, it was also the revelation of a 
nanostructure in which I could test some of 
my own ideas about magnetoresistance. 

Soon after, in early 1988, our two teams 
independently discovered GMR. Griinberg’s 
group showed it in their iron—chromium- 
iron sandwich; my team in France, in a stack 
of 40 layers of iron and chromium. 

It was clear from the outset that GMR 
would have vast applications — especially 
because it could happen at room temperature. 
It could be used to detect small magnetic 
fields easily, which proved useful for mag- 
netic sensors and, in particular, for reading 
magnetic hard disks. It was also the first 
example of electronics that exploit both the 
spin and charge of electrons — a field today 
called spintronics. 

I met Grinberg in August 1988 at the 
International Colloquium on Magnetic 
Films and Surfaces in Le Creusot, a small 
town in the middle of France. We presented 
our respective results, and came to the conclu- 
sion that we had made the same discovery. 
To our delight, most participants seemed to 
think it was important: we celebrated with a 
couple of glasses of local wine, a Pommard 
burgundy. That night, we had the feeling that 
the conference concert, with pieces played on 
piano and violin by some of our colleagues, 
was in celebration of our work. Grinberg, a 
skilled guitarist, did not play that time, but I 
had several opportunities to hear him play. 

Grinberg had an impressive vision for 


smart GMR-based devices, including the 
spin-valve concept that he patented and was 
later developed at IBM for use in hard disks. 
The first commercial GMR-based hard disks 
appeared in 1997. Since then, their data- 
storage capacity has risen by almost three 
orders of magnitude. Griinberg conceived of 
many other devices, from magnetic sensors 
to the compass used in smartphones today. 

The discovery of GMR kicked off an 
intense period of research activity, including 
experiments on GMR and interlayer 
exchange coupling in a great diversity of mag- 
netic multilayers. At the same time, the theory 
behind GMR was developed. Griinberg’s 
Jiilich team took a semi-classical approach; 
I worked in collaboration with New York 
University on a quantum approach. 

The burgeoning field of spintronics has 
yielded fascinating results. In 1995, Terunobu 
Miyazaki and Nobuki Tezuka in Japan, anda 
US group led by Jagadeesh Moodera, inde- 
pendently showed that quantum tunnelling 
of electrons between magnetic layers gives 
rise to much larger magnetoresistances 
than with GMR (T. Miyazaki and N. Tezuka 
J. Magn. Magn. Mat. 139, L231-L234; 1995; 
J. S. Moodera et al. Phys. Rev. Lett. 74, 3273; 
1995). In 1996, John Slonczewski at IBM in 
Yorktown Heights, New York, and Luc Berger 
at Carnegie Mellon University in Pittsburgh, 
Pennsylvania, introduced the concept that 
transferring spins between magnetic materi- 
als could create torque on their magnetization 
direction (J. C. Slonczewski J. Magn. Magn. 
Mat. 159, L1-L7; 1996; L. Berger Phys. Rev. 
B 54, 9353; 1996). Today, this mechanism 
is exploited to write non-volatile magnetic 
memories. Griinberg and his team contrib- 
uted much to these research fields. 

Grinberg was warmly esteemed by 
colleagues around the world for his great crea- 
tive talent in physics, and for his integrity and 
modesty. To me, he was also a good friend. 
And I liked his sense of humour. Having 
shared emotional and amusing moments in 
Stockholm, we often recalled our procession 
to the Nobel banquet, when neither of us was 
completely successful in our attempts not to 
tread on the trains of the Royal Princesses. 

Peter was a great physicist, anda gentle and 
sincere person. The spintronics and nano- 
magnetism community will miss him sorely. m 
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Safeguarding stem-cell fidelity 


A complex that includes the protein HP1 binds to specific regulatory DNA sequences to promote local compaction of 
genomic regions and inhibit associated genes that drive differentiation of specific cell lineages. SEE LETTER P.739 


KRISTOFFER N. JENSEN 
& MATTHEW C. LORINCZ 


utations in the gene that encodes 
M activity-dependent neuroprotective 

protein (ADNP) cause the rare 
neurodevelopmental disorder Helsmoortel- 
Van der Aa syndrome, which is characterized 
by intellectual disability and autism spectrum 
disorder’. On page 739, Ostapcuk et al.” pre- 
sent a detailed study of the function of this 
transcription factor. They show that ADNP 
forms part of a nuclear complex that plays 
an essential part in maintaining the fidel- 
ity with which pluripotent stem cells give 
rise to the three primary cell lineages in the 
human body. 

DNA is intimately associated with many 
proteins, including transcription factors and 
the histones around which it is packaged. 
Together, this DNA-protein complex makes 
up chromatin. Gene-poor genomic regions, 
which are associated with low levels of 
transcription, adopt a condensed chroma- 
tin structure known as heterochromatin, 
whereas gene-rich regions adopt a relatively 
open structure called euchromatin. Although 
ADNP is associated with heterochromatin’, it 
has been unclear whether this interaction is 
relevant to the traits seen in ADNP-deficient 
mouse embryos — which include defects in 
the formation of the structure that gives rise 
to the brain and spinal cord, and aberrant 
expression of genes normally expressed in the 
extra-embryonic cells that support embryo 
development’. In addition, the spectrum of 
ADNP’s genomic targets in euchromatin has 
remained undefined. 

Ostapcuk et al. isolated the DNA sequences 
bound by ADNP in mouse embryonic stem 
(ES) cells — a type of pluripotent cell derived 
from mouse embryos. They identified about 
15,000 genomic sites at which ADNP was 
bound, most of which lay within genes or 
in nearby regulatory sequences that control 
gene expression. The authors showed that 
the expression of many of the genes that were 
bound by ADNP in wild-type mouse ES cells 
was upregulated in cells genetically engineered 
to lack this factor, supporting a direct role for 
ADNP in transcriptional repression. A subset 
of these genes encode proteins that promote 
differentiation into specific extra-embryonic 
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Figure 1 | Controlling gene expression through chromatin compaction. DNA is packaged around 
histone proteins in a complex called chromatin. a, HP1 proteins bind to methyl groups on histone 
H3 (a chemical modification called H3K9me3) to promote large-scale condensation of DNA into 
heterochromatin — a structure associated with transcriptional repression. b, Ostapcuk et al.’ report 
another role for HP1 — interacting with the DNA-binding protein ADNP and the chromatin- 


remodelling protein CHD4 in a complex called ChAHP. This complex promotes local condensation 

of chromatin independent of H3K9me3, in regions of otherwise loosely packaged chromatin called 
euchromatin. Such binding inhibits the expression of associated genes — including those that promote 
stem-cell differentiation into cells of the ‘endodermal lineage. c, In cells devoid of ADNP, the ChAHP 
complex is disrupted and CHD4 and HP1 are not recruited, resulting in locally decondensed chromatin, 


abnormal gene expression and spontaneous stem-cell differentiation. 


or embryonic tissues, in particular of the 
endodermal lineage (one of the primary cell 
lineages, which gives rise to the digestive and 
respiratory tracts). And, whereas wild-type 
mouse ES cells cultured under conditions that 
promote neuronal differentiation gradually 
became more neuron-like, ADNP-deficient 
cells failed to do so, and showed aberrant 
expression of endodermal genes. 

Next, to determine whether ADNP mediates 
transcriptional silencing by recruiting co- 
repressors, Ostapcuk and colleagues used an 
unbiased approach to identify proteins that 
interact with ADNP. Their screening yielded 
two types of chromatin-associated protein. 
The first, CHD4, is a chromatin-remodelling 
protein previously implicated in the control 
of genes associated with pluripotency and 
differentiation®. The second, HP1, has a role 
in transcriptional repression and packaging of 


heterochromatin. Together, ADNP, CHD4 and 
HP1 form a stable complex that the authors 
dubbed ChAHP. 

ADNP probably interacts with HP1 proteins 
through a well-documented HP1-binding 
domain in ADNP called a PXVXL motif®. 
There are three HP1 isoforms (HP1a, HP18 
and HP1y), but Ostapcuk et al. found that 
only HP1y and, to a lesser extent, HP16 inter- 
acted with ADNP, consistent with a report 
published earlier this year’, Furthermore, the 
authors showed that most genomic regula- 
tory regions bound by ADNP in wild-type 
cells were also bound by HP1y and HP16. By 
contrast, another study showed that ADNP 
does interact with HP1a in embryonic can- 
cer cells’. Reconciling these apparently con- 
tradictory results, Ostapcuk et al. found that 
ADNP can interact with HP 1a, but only in the 
absence of HP18 and HP1y. When the authors 
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examined cells lacking different combinations 
of HP1 isoforms, they found that all three had 
to be deleted to mimic the effects of ADNP 
mutation on gene expression. This indicates 
that the three HP1 proteins are functionally 
redundant — that is, they can compensate for 
one another in ChAHP. 

Why might ADNP preferentially interact 
with HP1y and HP1$? HP1a has an amino- 
terminal region, unique among HP1 
proteins, through which it promotes the self- 
organization of heterochromatin into liquid- 
like droplets’. This process, called phase 
separation, probably minimizes interactions 
with other nuclear proteins, maintaining the 
condensed state of heterochromatin. However, 
ChAHP-mediated silencing of euchromatic 
genes must be reversible to enable genes to 
respond to differentiation cues during devel- 
opment, perhaps making phase separation less 
desirable at these genomic regions. 

HP1 proteins interact with PXVXL- 
containing factors such as ADNP through a 
carboxy-terminal ‘chromoshadow’ domain. 
But in heterochromatin, HP1 binds histone 
H3 through an evolutionarily conserved 
amino-terminal chromodomain’, at sites 
where the histone is tagged by methyl groups 
on amino-acid residue lysine 9 — a chemical 
modification dubbed H3K9me3. It is through 
this interaction that HP1 proteins induce 
chromatin condensation. Might H3K9me3 
also promote HP1 binding at ChAHP-bound 
regions? In favour of this hypothesis, HP1 
can recruit ADNP to H3K9me3-marked 
heterochromatin*®. However, Ostapcuk and 
colleagues show that ChAHP can efficiently 
bind to its DNA targets even if HP1 is engi- 
neered to lack its chromodomain. Further- 
more, regulatory sequences bound by ChAHP 
lacked H3K9me3. Therefore, H3K9me3 is 
unlikely to have a role in ChAHP-mediated 
transcriptional silencing. 

People with Helsmoortel-Van der Aa 
syndrome generally have ADNP mutations 
that produce a truncated protein lacking the 
DNA-binding domain and the PXVXL motif’. 
To investigate whether this mutation disrupts 
ChAHP-complex formation, Ostapcuk et al. 
expressed one such patient-derived mutant 
protein in mouse ES cells. The mutant ADNP 
failed to bind to HP1f or HP1y, and genes 
normally bound by ChAHP were aberrantly 
expressed. Furthermore, analysis of chroma- 
tin accessibility in ADNP-deficient ES cells 
revealed that the mutation led to opening of 
chromatin in regions immediately flanking 
ADNP-binding sites. 

Together, Ostapcuk and colleagues’ find- 
ings demonstrate that ChAHP does not 
generate broad swathes of heterochromatin, 
as observed at H3K9me3-marked regions 
bound by HP1. Instead, the complex gener- 
ates focused regions of condensed chromatin 
that inhibit the transcription of differentiation- 
promoting genes. Aberrant expression of such 
genes in the absence of the ChAHP complex 


is probably a crucial factor in the aetiology of 
Helsmoortel-Van der Aa syndrome (Fig. 1). 

Although Ostapcuk and co-workers’ study 
focused on the interplay between ADNP, CHD4 
and HP1, the researchers found many fewer 
genes upregulated in mouse ES cells lacking 
ADMFP than in those lacking all three HP1 iso- 
forms. And, consistent with previous reports”, 
the authors’ analysis of HP 1-interacting proteins 
revealed a plethora of overlapping and isoform- 
specific binding partners, many of which have 
DNA-binding activity. Notably, mutations in 
several of these HP1-interacting transcription 
factors are implicated in other rare syndromes 
associated with intellectual disability, includ- 
ing in genes that encode the proteins AHDC1 
(ref. 12), CHAMP] (ref. 13) and POGZ (ref. 14). 
Thus, it is tempting to speculate that HP1 pro- 
teins act as co-repressors for many as-yet-unde- 
scribed DNA-binding complexes that regulate 
the expression of distinct gene sets. 

Ostapcuk and co-workers’ study has 
revealed a key mechanism of HP1 recruitment 
to chromatin. Their work sets the stage for 
future studies on the broader role of this enig- 
matic co-repressor in gene regulation and local 
chromatin compaction. = 
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Molecular dynamics 
simulated by photons 


The microscopic behaviour of molecules can be difficult to model using ordinary 
computers because it is governed by quantum physics. A photonic chip provides 
a versatile platform for simulating such behaviour. SEE ARTICLE P.660 


FABIEN GATTI 


uantum-computing devices could one 

day outperform ordinary computers, 

particularly in the simulation of quan- 
tum systems. Such devices share their quantum 
nature with the system to be simulated and are 
therefore inherently suited to describing quan- 
tum phenomena’. On page 660, Sparrow et al.’ 
reporta device based ona single photonic chip 
that can simulate a range of quantum dynam- 
ics associated with different molecules. The 
results are in excellent agreement with simu- 
lations carried out by ordinary comput- 
ers, reaffirming the potential of quantum 
technology in this area. 

In conventional industrial chemistry, the 
yields of chemical processes are optimized 
by controlling macroscopic variables, such as 
temperature and pressure. But the use of high 
temperatures and pressures wastes a substan- 
tial amount of energy and generates unwanted 
by-products, leading to high energy consump- 
tion and pollution. To overcome these issues, 


a promising optimization approach exploits 
the quantum nature of the reacting molecules. 

A central tenet of quantum physics is the 
superposition principle, which asserts that 
possible quantum states of a system can be 
added together and the result will be another 
possible state. The non-classical aspect of 
this principle is demonstrated, for example, 
by quantum bits. These objects can exist in 
both an on state and an off state at the same 
time. Such states exhibit quantum coherence, 
which means that they are correlated in a 
non-classical way. 

The ability to systematically control 
quantum coherence is considered one of 
the main challenges in energy science. Such 
control might enable the synthesis of highly 
desirable materials and devices, including 
superfluids (fluids that flow without resist- 
ance) and quantum computers. It could also 
give rise to more-efficient chemical processes 
than are currently possible. 

In conventional chemistry, the quantum 
states involved in chemical processes are 
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incoherent. However, coherent superpositions 
of molecular states can be produced using the 
light emitted by a laser. The ability to consist- 
ently generate these superpositions could 
improve the efficiency of the corresponding 
chemical processes and reduce the energy 
required to control such processes. It might 
even open up chemical-reaction mechanisms 
that are otherwise inaccessible™*. 

Laser pulses are the main tool for manipulat- 
ing molecules in this field. Improvements in 
the design of these pulses, such as increases in 
power and tunability, as well as the ability to 
reduce the duration of the pulses to attosecond 
(10"'’s) timescales, have enabled greater con- 
trol of light-induced processes in molecules’. 
Since the pioneering work of Ahmed Zewail, 
who was awarded the 1999 Nobel Prize in 
Chemistry (see go.nature.com/2idzowg), 
laser pulses have been used to study quantum 
coherence in chemistry’. 

For example, quantum coherence has been 
used to enhance the rates of chemical reactions 
in biological systems at room temperature’. 
Such studies conclusively showed that quan- 
tum coherence can be partially preserved even 
in molecular systems open to the external 
environment’. 

These experimental advances call for accu- 
rate models of the quantum evolution of 
molecular systems. This is a challenging task 
for quantum-computing devices, although 
much progress has been made, thanks to the 
development of improved algorithms for simu- 
lating quantum dynamics”. 

Sparrow and colleagues engineered a 
quantum-computing device that is based ona 
single photonic chip. They used the quantum 
superposition of photons in the chip to carry 
quantum information and to model molecu- 
lar systems. By adjusting the optical circuitry 
of the chip, the authors simulated a range of 
quantum dynamics associated with different 
molecules. 

The authors began by simulating vibrational 
excitations in a variety of four-atom molecules. 
They then modelled energy transport in the 
chemical bond of a protein and the transfer 
of vibrational energy in liquid water. Finally, 
they tested an algorithm designed to identify 
quantum states that can lead to the break-up 
of ammonia. The results of these simulations 
were in almost perfect agreement with those 
obtained using ordinary computers. 

The first quantum revolution occurred at 
the turn of the twentieth century, and provided 
us with the physical laws that govern reality. 
Sparrow and colleagues have now simulated 
the time evolution ofa quantum superposition 
of molecular states with the aid of an experi- 
mental device that uses the quantum super- 
position of photons. Such a feat suggests that 
we could be entering a second quantum revo- 
lution, in which the physical laws of nature are 
used to develop innovative technologies. 

Despite these promising prospects, it is not 
difficult to envisage the problems that follow-up 


642 | NATURE | VOL 557 | 31 MAY 2018 


studies will encounter. In this seminal work, the 
authors used rather simple molecular models, 
involving a limited number of mathemati- 
cal terms. However, this number will increase 
exponentially when aiming to closely repro- 
duce experimental conditions. Such an increase 
might dramatically enhance what the authors 
refer to as the “fundamental errors” in photon- 
ics, which include the loss of photons and the 
loss of quantum coherence. 

Nevertheless, Sparrow and colleagues have 
demonstrated that simulations carried out by 
quantum-computing devices can be both reli- 
able and efficient, by tackling problems that 
can be solved using well-established standard 
techniques. As the authors point out, slight 
improvements in their method could yield 
simulations that cannot be achieved using 
ordinary computers. = 
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Gut molecules control 
brain inflammation 


Metabolite molecules produced by the gut’s microbes activate immune cells in 
the brain called microglia, which signal to astrocyte cells to mediate responses to 
inflammation in the central nervous system. SEE LETTER P.724 


HARTMUT WEKERLE 


ome immunologists regard the central 

nervous system (CNS) as a no-man’s- 

land, avoided by immune cells and 
therefore uninteresting. But, in fact, the 
CNS has a vigorous immune potential that 
remains dormant in normal conditions but 
is awakened after injury. The switch that con- 
trols the brain’s immune microenvironment 
involves non-neuronal cells called glia — not 
only microglia, which are sometimes called the 
immune cells of the CNS, but also multifunc- 
tional cells called astrocytes’. Rothhammer 
et al.” describe on page 724 how these two glial 
cell types communicate on a molecular level to 
influence inflammation in the CNS, and show 
that this interaction is controlled remotely by 
microbes that inhabit the gut. 

A decade ago, the group that performed 
the current study, along with another 
research group, discovered** an unexpected 
immunoregulatory role for a ligand-activated 
transcription factor called the aryl hydrocar- 
bon receptor (AHR), which at the time was 
best known as a receptor for environmental 
toxins’. The two groups showed that AHR 
modulates the progression of experimen- 
tal autoimmune encephalomyelitis (EAE) 
— an autoimmune disease in mice in which 
the immune system becomes overactive and 
attacks the CNS. EAE is often used a model 


of multiple sclerosis (MS). Initially, the groups 
focused on how AHR might affect EAE by 
regulating pathogenic and protective subsets 
of immune cells outside the CNS. But it later 
emerged that AHR is also strongly expressed 
in the CNS, particularly in microglia and astro- 
cytes’, raising the question of whether AHR in 
the CNS has a role in autoimmune diseases. 

In the current study, Rothhammer et al. 
induced EAE in mice that had been geneti- 
cally engineered so that AHR could be deleted 
in microglia (but not in other brain cells or 
immune cells) by a drug treatment. Elimi- 
nation of microglial AHR substantially exacer- 
bated EAE in the AHR-depleted mice, but left 
immune responses outside the CNS unaltered. 
This finding suggests that AHR activation in 
microglia inhibits inflammation in the CNS. 

Microglia rarely act alone. Instead, they 
often team up with other cell types to respond 
to the stimuli that activate them. For example, 
after being activated, microglia can instruct 
certain astrocytes to attack local neurons’. 
Rothhammer and colleagues found that AHR- 
deficient microglia activated by EAE triggered 
exaggerated inflammatory responses in local 
astrocytes. Next, the authors used bioinformat- 
ics to analyse the gene-expression pathways 
altered in these glia. This analysis suggested 
that unexpected proteins signal from microglia 
to astrocytes. 

The usual suspects in such cases are 
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pro-inflammatory signalling molecules, but 
Rothhammer et al. showed that AHR in micro- 
glia directly regulates the expression of genes 
that encode the proteins TGF-a and VEGF-B 
(Fig. 1) — neither of which has previously 
received much attention from neuroimmunol- 
ogists. Subsequent detailed in vitro and in vivo 
analyses confirmed that TGF-a and VEGF-B 
regulate the pro-inflammatory reactivity of 
astrocytes. TGF-a dampens astrocyte inflam- 
matory responses to EAE, and its expression 
in microglia is inhibited by AHR deletion. 
Conversely, VEGF-B enhances responses to 
EAE, and its expression is promoted by AHR 
deletion. 

So, the activity of microglia and astrocytes 
is modulated by AHR during brain inflamma- 
tion in autoimmune disease. But which signals 
might modulate microglial AHR? In addition 
to environmental toxins, AHR is bound bya 
broad range of molecules, including dietary 
derivatives*. In particular, food plants such 
as broccoli and other members of the cab- 
bage family contain components that bind 
AHR either directly or after being processed 
into metabolite molecules, such as deriva- 
tives of tryptophan (Trp), by gut microbes’. 
Rothhammer et al. fed their mice diets either 
depleted or enriched in Trp. Trp depletion 
exacerbated EAE in wild-type mice, whereas 
enrichment ameliorated the effects of the 
disease. By contrast, neither diet had any effect 
on the progress of EAE in AHR-deficient 
animals, as might have been predicted — in 
these animals, Trp cannot bind to AHR to 
dampen immune responses. 

To determine whether their work is likely 
to have implications for humans, the authors 
verified basic elements of their analyses in tis- 
sue samples from people with MS, in which 
an autoimmune attack drives glial inflam- 
mation, destruction of nerve processes and 
their insulating myelin sheaths, and ulti- 
mately scar formation’®. The group found that 
AHR, TGF-a and VEGF-B were expressed in 
microglia-like cells in MS tissues. Levels of the 
proteins were higher in newly inflamed regions 
than in old scar tissue or unaffected surround- 
ing tissue. This suggests (but does not prove) 
that TGF-a and VEGF-B have a role in the 
formation of MS scar tissue. 

Rothhammer and colleagues’ work sheds 
light on the complex regulation of inflamma- 
tory reactivity in the CNS and adds another 
facet to our understanding of the gut-brain 
connection. Robust regulation of inflamma- 
tory responsiveness is essential for proper 
CNS function. Deficient regulation, with 
unrestrained inflammatory episodes, leads to 
sickness, irreversible cell loss and scar forma- 
tion’, whereas compromised inflammatory 
reactivity can result in tumour formation and 
opportunistic infection”. The authors’ find- 
ings are therefore likely to have implications 
beyond MS. 

The interactions between the gut, microglia 
and astrocytes outlined by Rothhammer et al. 
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Figure 1 | Long-distance regulation of immune cells in the brain. Gut bacteria process the dietary 
component tryptophan to produce metabolite molecules that enter the central nervous system (CNS). 

In the brain, these metabolites act as ligands for the aryl hydrocarbon receptor (AHR) — a transcription 
factor expressed in cells called microglia and astrocytes that mediate responses to inflammation in the 
CNS. Rothhammer et al.” report that, when activated in microglia, AHR binds to the genes that encode 
the proteins VEGF-B and TGF-a, inhibiting the expression of the former and promoting that of the latter. 
Any VEGF-B released from microglia enhances the responsiveness of astrocytes to CNS inflammation. 
By contrast, TGF-a dampens astrocyte responsiveness. 


are not the only mechanisms that safeguard 
inflammatory responses in the brain”’. It will 
be of interest to examine how other regula- 
tors of the CNS immune microenvironment 
modulate the newly identified signalling 
pathway. These factors include the cells asso- 
ciated with the cerebral blood vessels, as well 
as active neurons. Indeed, pharmacological 
silencing of neurons leads to the activation of 
neighbouring microglia™. 

That the behaviour of microglia can be 
controlled remotely by intestinal products is 
intriguing, although not without precedent. 
A flurry of observations previously linked the 

CNS to the gut and its 


“That the microbial contents. 
behaviour of Neuronal pathways, 
microglia can opie micro- 

ial molecules and 
sehen metabolites are all 
Ensouls a y involved in signal- 
m a na ling between these 
pro lucts 1s ” regions’. Specifi- 
intriguing. cally, short-chain 


fatty acids produced 
by gut bacteria can modulate microglia cells’®, 
and tryptophan metabolites act directly on 
astrocytes®. Nonetheless, the current findings 
broaden our understanding of the gut-brain 
connection. The authors speculate that this 
pathway might support the repair of injured 
neural cells. 

As Rothhammer and colleagues point out, 
their experimental observations might lead 
to new therapeutic approaches to quelling 
unwanted CNS inflammation, and possibly 
to supporting neuronal repair. First, enhance- 
ment of TGF-a and blockade of VEGF-B 
might reduce CNS inflammation to an accept- 
able, non-toxic level. Second, clinical CNS 


inflammation could be dampened indirectly 
by means of the gut. Dietary protocols that 
promote anti-inflammatory regulation could 
bea promising non-invasive approach to treat- 
ing brain inflammation. It is to be hoped that 
diets that have been proposed as effective med- 
ications for diseases such as MS, but whose 
effectiveness has yet to be formally proved”, 
will now be re-examined. = 
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STRUCTURAL BIOLOGY 


Enzyme illuminates 
bacterial ubiquitination 


Structural analysis reveals how a bacterial enzyme catalyses attachment of the 
protein tag ubiquitin to host proteins, illuminating a process that allows pathogenic 
bacteria to subvert host-cell function. SEE ARTICLE P.674 & LETTERS P.729 & P.734 


KATHY WONG & KALLE GEHRING 


biquitination isa type of protein modifi- 
| cation in which the protein ubiquitin is 
attached to a target protein. In eukary- 
otes (organisms that include fungi, plants and 
animals), the addition of a ubiquitin tag can 
act as a signal for various cellular processes. A 
prime example is the destruction of ubiquit- 
inated proteins by a eukaryotic protein complex 
called the proteasome. The ubiquitination pro- 
cess is also the target of many bacterial patho- 
gens, which have developed techniques to hijack 
it for their own benefit. In papers in Nature, 
Akturk et al.’ (page 729), Dong et al? (page 674) 
and Kalayil et al.’ (page 734) describe the X-ray 
crystal structure of the bacterial enzyme SdeA, 
which catalyses ubiquitination. And, writing in 
Cell, Wang et al’ report the structure of a bacte- 
rial enzyme called SidE from the same protein 
family as SdeA. 

The eukaryotic ubiquitination pathway 
requires a three-enzyme cascade”. An enzyme 
called El activates ubiquitin using a molecule 
of ATP and a magnesium ion (Mg”) to cova- 
lently bind the ubiquitin through a type of 
linkage called a thioester bond. The activated 
ubiquitin is then transferred to downstream 
enzymes, which attach ubiquitin through 


an isopeptide bond to a lysine amino-acid 
residue in the target protein. The discovery’ of 
the SidE family of ubiquitin ligase enzymes in 
the bacterial pathogen Legionella pneumophila 
revealed a ubiquitination pathway with strik- 
ing differences from the eukaryotic system. Not 
only can SidE ligases carry out the complete 
process without the aid of other enzymes, but 
this pathway also generates a different form of 
ubiquitin, termed phosphoribosylated ubiqui- 
tin (PR-Ub), in which a phosphoribose-sugar 
linkage attaches ubiquitin to the target protein’. 

The bacterial ubiquitination pathway 
requires a molecule of NAD* instead of the ATP 
and Mg” used by eukaryotes. In the first step, 
the mono-ADP-ribosyltransferase (mART) 
domain of SdeA uses ubiquitin and NAD* to 
covalently attach an adenosine diphosphate 
ribose (ADPR) molecule to an arginine resi- 
due (Arg42) of ubiquitin’, producing an ADPR- 
Ub molecule. The phosphodiesterase (PDE) 
domain of SdeA then cleaves this ADPR-Ub to 
release the molecule AMP, generating PR-Ub, 
and this group forms a bond witha serine resi- 
due in the target protein’. However, the molec- 
ular details of how ubiquitination is catalysed 
were a mystery until now. 

The four papers’ * provide a detailed 
picture of the bacterial reaction pathway, with 


complementary insights into the catalytic 
mechanisms. Akturk, Dong and Kalayil, and 
their respective colleagues, report atomic 
structures of SdeA’s catalytic core, which 
consists of the PDE and mART domains. 

Dong et al. imaged the largest fragment 
of SdeA, which includes part of the protein’s 
carboxy-terminal domain. They observed that 
the C-terminal domain is required to anchor 
the PDE and mART domains, stabilizing the 
enzyme in an active conformation. The struc- 
ture of SidE reported by Wang and colleagues 
reveals that its catalytic domains are similar to 
those of SdeA, but the authors conclude that 
the C-terminal domain mediates SidE dimeri- 
zation. This disparity might result from dif- 
ferences in the experimental conditions used 
by the two groups, or might reflect special- 
ized functions of the individual SidE family 
members. 

The generation of ADPR-Ub from ubiquitin 
and NAD* in the first step of the reaction is 
revealed in a structure presented by Dong 
et al. of the mART domain in complex with 
ubiquitin and the molecule NADH, which is 
similar to NAD* but can inhibit catalysis by the 
enzyme. This revealed that the Arg42 residue 
in ubiquitin that becomes modified is located 
too far away from the ribose group of NADH 
for modification to occur directly. By contrast, 
another of ubiquitin’s arginine residues, Arg72, 
which was previously shown to be important 
in SdeA-mediated ubiquitination’, is located 
much closer to the enzyme-bound NADH. 
The authors used computer simulations of the 
complex, called molecular dynamics, to show 
that Arg72 and one other arginine residue 
(Arg74) anchor ubiquitin to mART. Once the 
nicotinamide group from NADH is released 
from the enzyme, a conformational change 
can occur, allowing Arg42 to replace Arg72 
in the active site. This model explains why 
ADPRattaches selectively to Arg42 and not to 
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Figure 1 | The ubiquitination mechanism used by bacteria. Four 

~ provide complementary insights into how bacterial enzymes 

from the SidE family mediate the process in which the protein ubiquitin 
(Ub) is attached to a target protein. Akturk et al.', Dong et al.” and Kalayil 

et al.” report the structure of the enzyme SdeA, and Wang et al.’ present 

the structure of the enzyme SidE. a, In the first step, the enzyme’s mART 
domain processes NAD* and adds an adenosine diphosphate ribose (ADPR) 
group to the amino-acid residue arginine 42 (Arg42) of ubiquitin. This 
generates ADPR-Ub in a reaction that releases nicotinamide. Dong et al.’ 
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reveal that Arg72 of ubiquitin (not shown) helps to anchor ubiquitin to 
the enzyme. b, ADPR-Ub is then processed by the enzyme’s PDE domain. 
The molecule AMP is released in a reaction that generates ubiquitin 
bound to a phosphoribosyl group (PR-Ub); the phosphoribosyl group, in 
turn, is covalently attached to the enzyme’s amino-acid residue histidine 
277 (His277). ¢, Ifa protein substrate enters the enzyme’s active site, 

the enzyme catalyses the attachment of PR-Ub toa serine residue (not 
shown) on the protein substrate. If, instead, water enters the active site, 
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other arginines in ubiquitin, but further study 
is warranted to fully understand the process. 

As the reaction proceeds, ADPR-Ub is 
processed by the PDE domain and PR-Ub 
is attached to a serine residue on a sub- 
strate protein. In addition to their studies of 
SdeA, Akturk et al. present the structure of 
ADPR-Ub in complex with SdeD, a member 
of the SidE family that contains only a PDE 
domain. Kalayil et al. used mass spectrometry 
techniques to study the SdeA catalytic inter- 
mediates at this stage. Both groups propose a 
two-step reaction mechanism for SdeA on the 
basis of studies of SdeA or SdeD. 

First, the Glu340 amino-acid residue of 
SdeA binds ADPR-Ub. The His277 residue 
of SdeA interacts with a phosphate group on 
ADPR-Ub, resulting in the release of a mol- 
ecule of AMP. Second, His407 activates the 
hydroxyl group of a serine residue on the tar- 
get protein, which enables the attachment of 
PR-Ub to the serine. Using a mutated version 
of SdeA in which the histidine residue at posi- 
tion 407 was replaced with asparagine to trap 
a catalytic intermediate, Kalayil et al. captured 
PR-Ub bound to His277 of SdeA (Fig. 1), con- 
firming the catalytic mechanism. Wang et al. 
report the structures of related complexes of 
ADPR and ubiquitin with SidE. 

Ifa water molecule enters the PDE domain's 
active site instead of a serine amino-acid resi- 
due, the reaction product released is unbound 
PR-Ub. PR-Ub can inhibit host El-dependent 
ubiquitination because the PR modification 
prevents this form of ubiquitin from being 
a substrate for eukaryotic ubiquitination 
enzymes’. Kalayil et al. answered the question 
of whether the pathogenicity associated with 
SdeA arises from the generation of unbound 
PR-Ub or from the ubiquitination of host pro- 
teins. The authors tested bacterial mutants 
lacking SidE proteins that were engineered 
to express either wild-type SdeA or a mutant 
version of SdeA that generates only unbound 
PR-Ub. The authors observed that the bacteria 
that express mutant SdeA were unable to grow 
in host cells, indicating that the enzyme’s key 
role is ubiquitination of host proteins. 

The role of PR-Ub is an emerging topic in 
the field of ubiquitin research. These structures 
of SidE family members now pave the way for 
more questions to be answered. For example, 
how is ADPR-Ub shuffled between the PDE 
and mART domains? The active sites of the 
PDE and mART domains are far apart (55 ang- 
str6ms) and do not face each other. There is 
conflicting evidence as to whether SidE pro- 
teins exist as monomers or dimers, and, asa 
result, there are different models of how the 
gap between the domains might be bridged. 

And what range of functions does the 
enzyme’s C-terminal domain have? The 
C-terminal domain stabilizes the catalytic core 
in SdeA but mediates protein dimerization in 
SidE. Dong et al. observed that ubiquitin mol- 
ecules bind to the C-terminal domain of SdeA 
and induce a large conformational change 


in the enzyme, which suggests a possible 
regulatory role for this domain. 

How many host proteins are ubiquitinated 
by SidE-family ligases? So far, only a few SdeA 
substrates have been identified®*”; these 
include the GTPase enzymes Rab and Rag, as 
well as the protein RTN4. From analysis of the 
ubiquitination sites in host proteins, Kalayil 
et al. and Wang et al. propose that the ligase 
enzyme specifically targets serine residues in 
disordered protein regions. 

Finally, perhaps the most exciting question 
still to be answered is this: do enzymes that 
mediate this type of ubiquitination process 
also exist in eukaryotes? m 
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Spinning on the 
edge of graphene 


Long-sought evidence has been found of magnetism at the edges of graphene, a 
two-dimensional form of carbon. The findings might enable the development of 
the logic gates needed for quantum computers. SEE LETTER P.691 


FERNANDO LUIS & EUGENIO CORONADO 


r | Vhe 2D form of carbon known as 
graphene has many potentially useful 
properties, but is usually not magnetic 

when pristine. However, theoretical pre- 
dictions suggest that the edges of graphene 
sheets should become magnetic when they 
have a zigzag arrangement of carbon atoms’. 
Observing this effect has been challenging 
because of the difficulties of detecting the pre- 
dicted minute magnetic signal and because it 
is hard to fabricate defect-free edges that have 
the required shape. On page 691, Slota et al.’ 
report a method for making nanometre-wide 
graphene ribbons in solution, and thereby for 
producing nanoribbons with well-defined zig- 
zag edges ‘decorated’ with organic radical mol- 
ecules that bear electron spins — a quantum 
property of electrons that is associated with 
magnetism. The authors’ results provide solid 
evidence of magnetism at graphene edges, and 
show that edge spins have potentially useful 
quantum dynamics. 

Magnetic forms of graphene would be use- 
ful for spintronics, a technology that forms the 
basis of today’s magnetic data storage”*. But 
the main interest in generating magnetic edge 
states in graphene is for quantum technologies. 
Electron spins can adopt two orientations rela- 
tive to an external magnetic field, and these 
could be used to encode the ‘0’ and ‘I’ states 
of a quantum bit (qubit), the basic informa- 
tion unit of future quantum computers and 
quantum -simulation devices. 

The quantum states of a qubit must be 


strongly coupled to external control stimuli 
that drive the qubit’s operation, but they 
must also be isolated from random external 
perturbations that can irreversibly upset the 
‘coherent’ evolution of such quantum states 
(coherence is the existence of non-classical 
correlations between quantum states). In these 
respects, graphene has potential advantages” 
over other materials that are being investi- 
gated as hosts for spin qubits, such as gallium 
arsenide or silicon: electric currents flowing 
through a graphene sheet provide a means of 
coupling and manipulating spins; and the two 
main sources of decoherence are minimal in 
graphene. These sources of decoherence are 
the coupling between an electron’s spin and its 
orbital motion (which is weak in graphene), 
and interactions of electron spins with atoms 
that have nuclear spins (the concentration of 
which is low in graphene). 

Why has it been so difficult to observe 
magnetic edge states experimentally? The 
electronic and magnetic properties of gra- 
phene nanoribbons correlate closely with the 
structures of their edges, and are sensitive to 
even minute numbers of defects. Isolating a 
sufficient number of nanoribbons that have 
perfect zigzag edges to enable their magnetic 
characterization is extremely challenging, 
and so the data from such studies’ are scarce 
and inconclusive. Experiments performed on 
single graphene layers prepared in situ under 
a high vacuum have revealed the formation of 
local electronic states at edges, but did not pro- 
vide any evidence of magnetism’. 

By expanding a previously developed 


31 MAY 2018 | VOL 557 | NATURE | 645 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


| RESEARCH | NEWS & VIEWS 


50 Years Ago 


One of the best ways to spread plant 
diseases is through the sale and 
shipment of seed. In some cases, 
such as celery leaf rust, only one 
infected plant in 10,000 is needed 
to cause an epidemic in the crop. 
Particularly critical are fungal 
diseases lodged within the seed ... 
The problem in dealing with 

these diseases has been to kill the 
fungus but not the seed. This type 
of disease can now be completely 
eliminated by a process developed 
at the National Vegetable Research 
Station ... The treatment is first to 
soak seed for twenty-four hours in 
a solution containing 0.2 per cent 
of the fungicide “Thirany at 30 °C. 
The seed is then dried by driving 
air through it for several hours. So 
far this treatment has been found 
to give complete control in eleven 
commercially important plant 
species with infections involving 
eighteen different seed-borne 
diseases. 

From Nature 1 June 1968 


100 Years Ago 


The trustees of the British Museum 
have published a report on an 
investigation carried out ... to 
ascertain how and when the 
infestation of Army biscuits by 
flour-moths takes place, and 
whether any steps can be taken to 
prevent this. A list is given of eight 
species of beetles and four Pyralid 
moths that were actually found in 
the tins of biscuits examined. But 
by far the most serious pest was 

the moth Ephestia kihniella ... 
Evidence is adduced indicating that 
Central America is probably the 
original home of E. kihniella, the 
so-called Mediterranean flour- 
moth. The examination of various 
intact airtight tins showed that the 
biscuits contained in them were 
infested, thus indicating that the 
moths had gained access to them in 
the factory prior to packing. 

From Nature 30 May 1918 
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chemical method’, Slota et al. synthesized 
graphene nanoribbons in solution that have 
uniform widths and zigzag edges. The authors 
attached nitronyl nitroxide molecules — 
chemically robust organic radicals, which 
are magnetic because they carry an unpaired 
electron — to specific edge sites (Fig. 1). This 
method produces large amounts (milligram 
quantities) of chemically stable graphene 
nanoribbons that can be studied using conven- 
tional spectroscopic techniques. The authors 
show that the electron spins at the radicals 
induce a spin density at the edge carbon sites 
where the radicals are bonded, and therefore 
induce magnetic edge states. This trick is 
akin to moving a row of corks on a string up 
and down on the surface of a pool to induce 
ordered water oscillations at the pool’s edge; 
not only do the corks induce waves, but they 
also make them easier to visualize. 

Besides proving the existence of magnetic 
edge states, Slota and colleagues’ experiments 
provide the first direct determination of the 
strength of the tiny spin-orbit coupling in their 
system. These findings will help to validate 
theoretical models of the electronic structure 
of graphene and its edge states’. 

The authors also measured the characteristic 
rates at which spins relax (reach equilibrium 
with the graphene lattice) and the time taken 
for them to lose coherence. The measured 
decoherence times are roughly one micro- 
second at room temperature — which is 
promising, because it means that spin coher- 
ence is preserved for much longer than has 
previously been measured in graphene elec- 
tronic devices. A plausible explanation for 
this is that the graphene nanoribbons are free 
from the structural randomness and extrin- 
sic effects (such as spin scattering caused by 
connecting graphene to electrodes) that have 
suppressed spin coherence in other systems’. 
Slota et al. find that decoherence in their nano- 
ribbons seems to be mainly associated with 
interactions of the electron spins with nuclear 
spins in the radical molecule. This is good 
news, because chemical methods are available 
to reduce the concentration of nuclear spins, or 
to make spin qubits insensitive to the magnetic 
noise generated by nuclear spins’”. 

Finally, the authors showed that unpaired 
electrons at the radicals interact with the edge 
spins. These interactions might allow graphene 
to be used as a coherent communication chan- 
nel between different radical spins, and might 
therefore serve as the basis of the two-qubit 
logic gates necessary for a quantum computer. 

Slota et al. show that the attachment of 
magnetic molecules to graphene creates coher- 
ent magnetic states on it, nicely complement- 
ing previously reported experiments” that 
showed how graphene influences the electron 
spins on molecules deposited on it. How- 
ever, in the authors’ system, electron spin is 
‘injected’ into the nanoribbons from the radi- 
cal molecules — so the intrinsic magnetism 
of graphene edges remains to be investigated. 


Figure 1 | Electron spins at graphene edges. Slota 
etal.’ have made ribbons of graphene (grey) 

that have zigzag edges (black), and free-radical 
molecules (blue) attached at specific sites. Each 
molecule has an unpaired electron, which has 

an associated quantum property known as 

spin (blue arrows). The molecules stabilize the 
carbon nanoribbons and perturb electrons in the 
graphene at the edges (purple ripples), generating 
electron spins at the edges (red arrows). 


One way to explore this would be to attach 
non-magnetic molecules, rather than free 
radicals, to the graphene edges. 

A formidable challenge in the development 
of the reported nanoribbons for quantum 
computers will be to design a system that 
can manipulate and read out each qubit in a 
nanoribbon, and that can switch interactions 
between qubits on and off, in a way that also 
allows the computer to expand to incorporate 
more qubits without losing control of them. 
This will probably require graphene nanosheets 
to be coupled to a solid-state device, so it 
remains to be seen how the effects of coupling 
to the device will affect spin coherence. 

Moreover, if the strength of the spin-orbit 
coupling of edge-modified graphene nano- 
ribbons can be increased, then the spin at the 
attached molecules could be manipulated 
using an electric field. Such strengthening 
might be achieved by replacing the organic 
radicals with molecular metal complexes — 
which would require new chemical methods. 
It therefore seems that chemists hold the key to 
technologies and scientific discoveries involv- 
ing magnetic graphene. m 
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Yeast shuffles towards 
a diverse future 


Aredesigned yeast genome is being constructed to allow it to be extensively 
rearranged on demand. A suite of studies reveals the versatility of the genome- 
shuffling system, and shows how it could be used for biotechnology applications. 


JEE LOON FOO & MATTHEW WOOK CHANG 


global consortium of scientists is well 
A: the way to making a synthetic 
genome for the yeast Saccharomyces 
cerevisiae’ — the first synthetic genome for 
a member of the group of organisms known 
as eukaryotes, which includes plants, animals 
and fungi. Embedded within the extensively 
redesigned ‘version 2.0’ genome of S. cerevisiae 
(Sc2.0) are DNA sequences that form part of 
a system known as Synthetic Chromosome 
Rearrangement and Modification by LoxP- 
mediated Evolution (SCRaMDbLE). This system 
allows extensive reorganization of the genome 
to be triggered on demand, generating Sc2.0 
variants that have diverse genetic make-ups 
and characteristics. Sc2.0 is therefore a versa- 
tile platform that can be easily modified and 
evolved to produce yeasts that have desired 
attributes”. A collection of seven papers*” 
published in Nature Communications dem- 
onstrates the immense potential of Sc2.0 for 
engineering and understanding yeast. 
To enable SCRaMDLE, a palindromic DNA 
sequence known as loxPsym is inserted after 
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Figure 1 | Genome rearrangement on demand. A synthetic genome 

of the yeast Saccharomyces cerevisiae is being constructed that allows the 
genome to be rearranged using a system known as Synthetic Chromosome 
Rearrangement and Modification by LoxP-mediated Evolution (SCRaMbLE). 
In the first version of this system, a palindromic DNA sequence known as 
loxPsym is inserted after every non-essential gene, and a protein consisting of 
the enzyme Cre recombinase attached to an oestradiol-binding domain (EBD) 


every non-essential gene in the synthetic 
genome. In the presence of the enzyme Cre 
recombinase, the loxPsym sites undergo 
recombination with each other — that is, the 
loxPsym sequences break in the middle, and 
the broken ends can then join up with any 
other available loxPsym ends. This process 
results in genes being randomly deleted, 
inverted, relocated and duplicated. 

In the original design of the SCRaMbLE 
system’, Cre recombinase was produced 
only once during the lifetime of a cell, and was 
fused to a protein domain that binds oestradiol 
molecules — which allowed the enzyme to be 
activated by adding oestradiol to the yeast’s 
growth medium, providing an on-off switch 
for genome rearrangement (Fig. 1). However, 
some ‘background’ genome rearrangement 
occurred even without oestradiol activa- 
tion. This version of SCRaMbLE was func- 
tional''’, but four of the new papers now 
report improvements to the system. 

Shen et al.’ have modified SCRaMbLE 
to produce multiple pulses of Cre recom- 
binase (instead of just one per lifetime) to 
increase rearrangement events while reducing 
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background Cre recombinase activity. Jia et al.’ 
have developed a SCRaMbLE variant in which 
both oestradiol and galactose molecules are 
required to activate rearrangement, also reduc- 
ing background rearrangement. Hochrein 
et al.’ have engineered Cre recombinase so that 
it is activated by red light, providing a new way 
to control SCRaMDbLE. And Luo et al.° have 
introduced a reporter DNA sequence into a 
synthetic yeast strain, which allows cells that 
have undergone SCRaMbLE-induced genome 
rearrangement to be easily distinguished from 
those that have not. All four improvements 
facilitate effective and efficient implementa- 
tion of SCRaMbLE. 

An important application of SCRaMDLE is 
to generate genetically diverse pools of yeast 
mutants from which strains that have industri- 
ally valuable characteristics can be isolated. For 
example, yeasts can be genetically engineered 
to produce useful compounds, and Blount 
et al.’ show that SCRaMbLE can generate yeast 
strains that produce antibiotics (violacein or 
penicillin) in greater quantities than could be 
achieved without SCRaMDLE. Blount and col- 
leagues also used the system to produce yeast 
strains that use the sugar xylose for growth 
more effectively than strains produced with- 
out SCRaMbLE; xylose is poorly used by wild- 
type yeast, but is abundant in biomass and is 
therefore an attractive alternative to the sugars 
normally used to feed yeast in industrial appli- 
cations. And Luo et al. have used their SCRaM- 
bLE variant to accelerate the isolation of yeast 
strains that are tolerant to various stress factors, 
such as ethanol, heat and acetic acid. 

Jia and co-workers report that production 
of B-carotene molecules can be drastically 
increased if SCRaMDLE is used in diploid 
yeasts, which have two copies of the genome, 
instead of haploids, which have a single copy. 
Similarly, Shen et al. used SCRaMDbLE in dip- 
loids to improve the heat or caffeine tolerance 
of hybrid yeasts (organisms produced by cross- 
ing two different yeast species or subspecies). 
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resides in the yeast cytoplasm. When the protein is activated by the binding 
of an oestradiol molecule, it moves into the nucleus (a) where it cleaves the 
loxPsym sequences (b). The broken ends of loxPsym can then join up with 
any other available JoxPsym ends, rearranging the genome. This process 
results in genes (such as the coloured rectangle) being randomly inverted, 
duplicated, relocated or deleted. Seven papers’ ’ now report improvements 
and applications of the SCRaMDbLE system. 
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Both groups observed genome rearrangements 
in diploids that involved the deletion of one 
copy of essential genes. The presence of such 
rearrangements in improved diploid strains 
shows that, compared to haploids, diploids 
are more robust to deleterious deletions during 
SCRaMbLE. This in turn allows a greater num- 
ber of beneficial rearrangements to be mani- 
fested. Although it is premature to claim that 
SCRaMDLE is a universal tool for engineering 
yeast, taken together, the various findings* 7 
certainly show that it has great potential for 
generating yeasts for a wide range of purposes. 

Wuetal.” have taken SCRaMDLE out of cells 
and used it in vitro with purified Cre recombi- 
nase to generate different genetic arrangements 
of the B-carotene biosynthetic pathway. They 
thus discovered arrangements that increase 
6-carotene production compared with the orig- 
inal pathway. By contrast, Liu et al.’ used an in 
vitro method involving recombinase enzymes 
separate from the SCRaMDLE system, to rap- 
idly generate different versions of B-carotene- 
and violacein-producing pathways and to 
identify highly productive ones. They then 
flanked the DNA sequences of the best path- 
ways with loxPsym, and used SCRaMDLE to 
randomly incorporate the pathways at loxPsym 
sites in the synthetic yeast genome. SCRaMbLE 
concurrently rearranged the resulting genomes, 
allowing yeast strains to be optimized for the 
production of the desired compounds. These 
two papers illustrate the versatility of the basic 
SCRaMbLE concept and how it can be used in 
innovative ways. 

So where next for Sc 2.0? So far, six synthetic 
chromosomes of Sc2.0 have been completed”, 
and consortium members are working full- 
time to construct the remaining ten. The 
seven new papers show that researchers are 
eager to work with the newly available syn- 
thetic chromosomes to see how SCRaMbLE 
techniques can generate useful yeast variants 
and improve our understanding of the fun- 
damental processes and properties of yeast. 
Thousands of loxPsym sites will be present 
in the fully assembled Sc 2.0 genome, and so 
the number of genomic structures that can be 
generated by SCRaMbLE is immense — which 
suggests that it should be possible to produce 
a yeast variant that displays any desired set of 
characteristics. 

Nevertheless, SCRaMDbLE systems are still 
in their infancy. Further improvements are 
needed, along with tools that maximize the 
potential of SCRaMbLE-based techniques. For 
example, the screening of SCRaMbLE-modified 
yeast has generally relied on visible cues, such 
as growth rate and colour (both B-carotene and 
violacein are pigments that colour the yeast 
cells). Luo and colleagues’ reporter offers a use- 
ful new screening tool, but high-throughput 
methods are also needed that can identify yeast 
strains that produce large amounts of colour- 
less chemicals. Crucially, the characterization 
of genetic rearrangements relies heavily on 
whole-genome sequencing. The development of 
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more-efficient, cheaper sequencing techniques 
would allow more strains to be sequenced than 
is currently possible, to work out and study 
changes in the genome. Given the promising 
early results and synergy among the members 
of the Sc2.0 consortium, the establishment of 
SCRaMbLE asa staple tool for engineering yeast 
is highly anticipated. m 
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A catalyst for 50 years 
of cancer research 


In 1968, a defect in DNA repair was found to underlie a disorder that makes 
people extremely sensitive to sunlight. This finding continues to influence 
research into the origins, diagnosis and treatment of cancer. 


RICHARD D. WOOD 


ome people are born with exceptional 

sensitivity to sunlight. Fifty years ago, 

writing in Nature, the biologist James 
Cleaver' reported a study of one such con- 
dition, and concluded that a failure of DNA 
repair was related to the extreme susceptibil- 
ity of affected individuals to skin cancer. This 
was the first description of defective DNA 
repair in a genetically inherited disorder that 
makes people prone to cancer. The concepts 
that developed from this work now permeate 
research into the genetic origins of cancer and 
its treatment. 

Starting in the 1870s, the Viennese derma- 
tologist Moritz Kaposi performed pioneering 
work that defined a rare disorder characterized 
by high sensitivity to sunlight. Young patients 
were severely burned by brief exposure to the 
sun and acquired frequent skin lesions, and 
some had a high incidence of skin tumours. 
Kaposi dubbed the condition xeroderma pig- 
mentosum’ (XP), using the Greek words for 
dry, pigmented skin — one of the symptoms 
of the disease. He recognized that this was a 
hereditary syndrome, but the underlying cause 
was not obvious. 

Little research into XP was then done until 
the 1960s, when a process called nucleotide 
excision repair was discovered in bacteria*>. 
In this process, enzymes clip out segments of 


DNA that have been damaged by light and 
replace them with fresh, undamaged DNA. 
Mutant bacterial strains were isolated that 
could be killed by low doses of ultraviolet 
radiation, and some of these were found to be 
unable to carry out excision repair*”. 

These concepts of DNA repair were 
then extended to human cells. By 1964, 
the biologists Robert Painter and Ronald 
Rasmussen had discovered that UV irradia- 
tion of mammalian cells led to a phenomenon 
that they interpreted as excision repair’. In 
their experiments, cultured human cells were 
supplied with radioactive molecules (bases) 
that could be incorporated into DNA. The 
cells were observed to incorporate new bases 
after UV irradiation, even when they were not 
duplicating their genomes, indicating that 
UV-damaged DNA was being replaced. 

In 1967, Cleaver joined Painter’s laboratory 
in San Francisco as a postdoctoral fellow. 
Cleaver had obtained his PhD at the University 
of Cambridge, UK, where he had been using 
radioactive bases to label DNA in human cells. 
In April of that year, Cleaver read a news- 
paper article in the San Francisco Chronicle 
that mentioned research showing that skin 
cells grown from patients with XP were extra- 
ordinarily sensitive to UV radiation’. Cleaver 
raised with Painter the idea that XP might 
involve a mutation that causes DNA repair to 
be defective, and suggested investigating this 
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Figure 1 | Evidence of defective DNA repair in cells from people with xeroderma pigmentosum. 
People born with the condition known as xeroderma pigmentosum (XP) are extremely sensitive to 
sunlight and are prone to skin cancer. In 1968, Cleaver’ reported experiments in which cultured cells 
from a healthy individual and from people with XP were irradiated with ultraviolet light to cause 
DNA damage and then analysed to see whether the cells incorporated radioactive molecules (bases) 
into their DNA. a, For the healthy cells, plots of measured radioactivity for different fractions of DNA 
revealed distinct peaks associated with DNA replication and DNA repair. b, By contrast, the XP cells 
lacked the repair-associated peak. This was the first evidence that defective DNA repair underpins a 
genetically inherited disorder that makes people susceptible to cancer. 


possibility. Painter replied: “It’s a crazy idea, 
but at your stage what have you got to lose!” 

Cleaver acquired cultures of growing skin 
cells from people with XP, and applied newly 
developed techniques*® to determine whether 
the cells were capable of excision repair. The 
results clearly showed that DNA repair was 
defective in XP cells that had been damaged by 
UV irradiation (Fig. 1). Painter was a generous 
mentor, and encouraged his junior colleague 
to pursue this major discovery independently. 
Cleaver’s results were published in Nature on 
18 May 1968. 

The paper's conclusions were strong. Cleaver 
used two completely different methods to 
show that DNA repair in XP cells is defective, 
using cells from three patients clinically veri- 
fied to have XP, and control cells taken from a 
patient with an unrelated hereditary disorder 
and from a healthy individual. The results sug- 
gested that XP is not a homogeneous disease, 
because cell lines from different individuals 
exhibited different levels of DNA repair. There 
was no indication, however, of which step was 
affected in the repair process, or which genes 
might be altered. Cleaver estimated that about 
70 DNA bases were incorporated in each 
repair event — not far from the actual num- 
ber of about 30 bases per repair event obtained 
later using more-precise methods”. 

The publication generated immediate 
excitement. DNA repair had previously been 
considered a somewhat obscure topic, but 
Cleaver showed that it hada key role in human 
health. The Nobel-prizewinning molecular 
biologist Joshua Lederberg penned an editorial 


in The Washington Post highlighting this 
important example of fundamental research 
that turned out to be relevant to disease". 
J. Michael Bishop, who won a Nobel prize in 
1989 for his work on oncogenes, which have 
the potential to cause cancer, was also influ- 
enced by the finding. He wrote*: “While I 
was still in medical school, James Cleaver 
recognized xeroderma pigmentosum as a 
deficiency in the repair of DNA damage 
caused by ultraviolet light... I have been a 
believer in the somatic mutation hypothesis 
of cancer ever since”. Somatic mutations are 
caused by DNA damage and copying errors in 
the genes of tumour cells as cancer progresses. 
Cleaver’s paper helped to stimulate the world- 
wide explosion of DNA-repair research that 
started in the 1970s’. 

Cleaver’s results were soon confirmed and 
extended by laboratories around the world. 
In 1972, it was reported that XP is a geneti- 
cally complex disease”, and it is now known 
that alterations in eight different genes can 
give rise to it’*"*. Seven of these genes encode 
components of the molecular machinery that 
performs excision repair; this machinery was 
biochemically reconstituted in vitro in the 
1990s'*'®. One form of XP, however, is caused 
by abnormal DNA synthesis after UV irra- 
diation”’, rather than by a problem in excision 
repair. 

Specific defects in DNA repair are now 
known to be associated with major neurologi- 
cal and developmental abnormalities in other 
UV-sensitivity disorders, including Cockayne 
syndrome’, More broadly, it has become 
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clear that many of the XP-associated genes 
have functions in addition to excision repair, 
and several are essential for life'*'*. This means 
that only mild disablement of the functions of 
some XP genes can be tolerated. 

Although XP is a rare disease (fewer than 
1 person in 250,000 is affected in the United 
States and Western Europe) 13 the conse- 
quences of mutations in XP genes are being 
explored widely. For example, a recent analysis 
found that mutations in the XPD gene (also 
known as ERCC2) are fairly frequent in can- 
cer and might modulate individual responses 
to treatment”. There is also active research 
aimed at suppressing the action of XP proteins 
in tumour cells, to improve the effectiveness of 
chemotherapies that damage DNA”. 

There is still no cure for XP, but intensive 
research into the disease means that an early 
diagnosis can be made. People with XP can 
then be protected rigorously from sunlight, 
allowing them a greater quality of life and 
longer life expectancy than was previously 
possible. XP societies in the United States and 
Europe provide support for affected children, 
with retreats such as Camp Sundown and 
Owl Patrol. Retinoid compounds can reduce 
the incidence of skin tumours”, and dietary 
interventions might improve the prospects for 
people with XP and related disorders’’. More 
broadly, Cleaver’s discovery of the DNA-repair 
defect in XP continues to spawn vigorous 
research into responses to environmental DNA 
damage that applies not only to humans, but to 
every organism on the planet. = 
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Emerging trends in global freshwater 


availability 


M. Rodell!*, J. S. Famiglietti?, D. N. Wiese?, J. T. Reager?, H. K. Beaudoing!’, FE. W. Landerer? & M.-H. Lo* 


Freshwater availability is changing worldwide. Here we quantify 34 trends in terrestrial water storage observed by the 
Gravity Recovery and Climate Experiment (GRACE) satellites during 2002-2016 and categorize their drivers as natural 
interannual variability, unsustainable groundwater consumption, climate change or combinations thereof. Several of 
these trends had been lacking thorough investigation and attribution, including massive changes in northwestern China 
and the Okavango Delta. Others are consistent with climate model predictions. This observation-based assessment of how 
the world’s water landscape is responding to human impacts and climate variations provides a blueprint for evaluating 


and predicting emerging threats to water and food security. 


roundwater, soil moisture, surface waters, snow and ice are 

dynamic components of the terrestrial water cycle’. Although 

they are not static on an annual basis (as early water-budget analy- 
ses supposed), in the absence of hydroclimatic shifts or substantial anthro- 
pogenic stresses they typically remain range-bound. Recent studies have 
identified locations where terrestrial water storage (T WS; the sum of these 
five components) appears to be trending below previous ranges, notably 
where ice sheets or glaciers are diminishing in response to climate change*” 
and where groundwater is being withdrawn at an unsustainable rate**. 

Accurate accounting of changes in freshwater availability is essential for 
predicting regional food supplies, human and ecosystem health, energy 
generation and social unrest. Groundwater is particularly difficult to 
monitor and manage because aquifers are vast and unseen, yet ground- 
water meets the domestic needs of roughly half of the world’s population? 
and boosts food supply by providing for 38% of global consumptive irri- 
gation water demand". Nearly two-thirds of terrestrial aquatic habitats 
are being increasingly threatened", while the precipitation and river dis- 
charge that support them are becoming more variable’”. A recent study’! 
estimates that almost 5 billion people live in areas where threats to water 
security are likely—a situation that will only be exacerbated by climate 
change, population growth and human activities. Therefore, the key envi- 
ronmental challenge of the 21st century may be the globally sustainable 
management of water resources. 

Much of our knowledge of past and current freshwater availability 
comes from a limited set of ground-based, point observations. Assessing 
changes in hydrologic conditions at the global scale is exceedingly difficult 
using in situ measurements alone, owing to the cost of installing and 
maintaining instrument networks, the presence of gaps in those networks 
and the lack of digitization and sharing of existing data’, Satellite remote 
sensing has proven crucial to monitoring water storage and fluxes in a 
changing world, enabling a truly global perspective that spans political 
boundaries". In particular, since its launch in 2002, the GRACE mission!° 
has tracked ice-sheet and glacier ablation, groundwater depletion and 
other TWS changes'®!°. On a monthly basis GRACE can resolve TWS 
changes with sufficient accuracy over scales that range from approxi- 
mately 200,000 km at low latitudes to about 90,000 km? near the poles!. 
However, owing to GRACE’s coarse spatial resolution, the inability to par- 
tition component mass changes and the brevity of the time series, proper 


attribution of the TWS changes requires comprehensive examination of 
all available auxiliary information and data, which has never before been 
performed at the global scale. 

Here we map T WS change rates around the globe based on 14 years 
(April 2002 - March 2016) of GRACE observations (Fig. 1). The GRACE 
data were processed using an advanced mass concentration”° (‘mascor’) 
approach that enables improved signal resolution relative to the standard 
spherical-harmonic technique”!. Best-fit linear rates of change after 
removing the seasonal cycle (referred to herein as ‘apparent trends’) are 
presented in Table 1 for 34 study regions. For context, the largest man- 
made reservoir in the USA, Lake Mead, has a capacity of about 32 Gt; 
during the study period, all but one of the 34 regions lost or gained more 
water than that, and eleven of them lost or gained more than ten times 
that amount. The reported uncertainty bounds are typically low because 
the process of removing glacial isostatic adjustment (GIA) signals is 
the only major source of error in the secular signal. Therefore, low 
uncertainty does not, on its own, imply that the apparent trends existed 
before the GRACE period or will continue into the future. The coeffi- 
cient of determination (r”), which represents the ‘goodness of fit’ of the 
regressed linear trends, is included in Table 1 to quantify the strength 
of the apparent trends relative to non-secular interannual variability. 
It is hence a useful, but by no means conclusive, piece of evidence that 
can be used to predict whether the trend will be fleeting or enduring, 
reflecting the cohesiveness of the TWS time series tendencies, as shown 
in Extended Data Fig. 1-4. We attribute the trends to natural variability, 
direct human impacts or climate change and forecast the likelihood that 
they will continue on the basis of 1979-2016 precipitation data from 
the Global Precipitation Climatology Project version 2.3 (GPCP)” (see 
Extended Data Figs. 5-8), an irrigated area map”, satellite-based lake- 
level altimetry time series**, Landsat imagery and published reports of 
human activities including agriculture, mining, reservoir operations 
and inter-basin water transfers. Further, for each region we provide 
the median climate model prediction of precipitation changes between 
1986-2005 and 2081-2100 using the Representative Concentration 
Pathways 8.5 Wm? (RCP8.5; 8.5 Wm” radiative forcing 
in 2100 relative to pre-industrial levels) greenhouse gas emissions 
scenario from the Intergovernmental Panel on Climate Change (IPCC) 
Fifth Assessment Report”. We chose the high-end (‘business as usual’) 
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Fig. 1 | Annotated map of TWS trends. Trends in TWS (in centimetres 
per year) obtained on the basis of GRACE observations from April 2002 to 
March 2016. The cause of the trend in each outlined study region is briefly 
explained and colour-coded by category. The trend map was smoothed 


scenario because it accentuates regional differences, which are more 
important for this analysis than absolute magnitudes are. Figure 2 pre- 
sents maps of the IPCC, GPCP and irrigated-area data. 


Global scale 
By far the largest TWS trends occur in Antarctica (region 1; —127.6 4 
39.9 Gt yr! averaged over the continent), Greenland (region 2; 
—279.0 423.2 Gt yr‘), the Gulf of Alaska coast (region 3; —62.6 + 8.2 
Gt yr~!) and the Canadian archipelago (region 4; —-74.6+4.1 Gtyr~!), 
where the warming climate continues to drive rapid ice-sheet and gla- 
cier ablation***®*”, Positive trends in sub-regions of Antarctica and 
Greenland result from increasing snow accumulation”* and millennial- 
scale dynamic thickening processes”?”°. Excluding those four ice- 
covered regions, one of the most striking aspects of changing TWS 
illuminated by Fig. 1 is that freshwater seems to be accumulating in 
far-northern North America (region 5) and Eurasia (region 6) and 
in the wet tropics, whereas the greatest non-frozen-freshwater losses 
have occurred at mid-latitudes**’. The observed trends are consistent 
with increasing rates of northern high-latitude precipitation during the 
study period and with the prediction of IPCC models that precipitation 
generally will decrease in mid-latitudes and increase in low and high 
latitudes by the end of this century”’. They also complement recent 
studies that identify increasing rates of precipitation in the tropics and 
increasing water storage and river discharge in the high Arctic!**”. 
However, because the rates of TWS change (0.45 + 0.43 cm yr! and 
0.17 +£0.12 cmyr7! in regions 5 and 6, respectively) and the coefficients 
of determination (0.52 and 0.10, correspondingly) are small, while GIA- 
related errors are relatively large, we cannot state definitively that these 
high-latitude tendencies are real trends. 

A second characteristic of the map is that it reveals a clear ‘human 
fingerprint’ on the global water cycle. As seen in Fig. 2, freshwater 
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is rapidly disappearing in many of the world’s irrigated agricultural 
regions®!*3-38_ A third aspect of the global-trend map is natural inter- 
annual variability; many of the apparent trends are probably temporary, 
caused by oscillations between dry and wet periods (themselves driven 
by El Nifo, La Nifia and other climatic cycles) during the 14-year study 
period??°, 


Eurasia 
The hotspot in northern India (region 7) was among the first non-polar 
TWS trends to be revealed by GRACE*™. It results from groundwater 
extraction to irrigate crops, including wheat and rice, in a semi-arid 
climate. Fifty-four per cent of the area is equipped for irrigation. We 
estimate the rate of TWS depletion to be 19.2+ 1.1 Gt yr’, which is 
within the range of GRACE-based estimates from previous studies 
of differently defined northern-India regions*'~*’. The trend persists 
despite precipitation being 101% of normal (namely, the 1979-2015 
GPCP annual mean for the region) during the study period, with an 
increasing trend of 15.8 mmyr~!. The fact that extractions already 
exceed recharge during normal-precipitation years does not bode well 
for the availability of groundwater during future droughts. The contri- 
bution of Himalayan glacier mass loss to the regional trend is minor)”. 

The increasing trend in central and southern India (region 8; 
9.4+0.6 Gtyr7') probably reflects natural variability of (mostly mon- 
soon) rainfall, which was 104% of normal with an increasing rate of 
3.7mmyr_! (0.4% per year). Although the 7” value is low (0.24), the 
TWS and rainfall trends are both consistent with the RCP8.5-predicted 
23% precipitation increase by 2100. 

The increasing trend in eastern central China (region 9) is caused by 
a surge in dam construction and subsequent reservoir filling across that 
region“. The best known is the Three Gorges Dam reservoir, which 
was filled to its design capacity of 39.3 Gt between June 2003 and 
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Table 1 | TWS trends and supporting information 
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Precipitation 


TWS trend Irrigated Precipitation Precipitation percentage Predicted 
TWS trend — errors rof TWS area trend trend of normal precipitation 
Region Location Area (km?) (Gt yr~}) (Gtyr-!) trend (%) (mm yr~!) (% yr-) (%) change (%) 
1 Antarctica 12,397,401 -—127.6 39.9 0.93 ) 0.82 0.40 96.6 30.9 
2 Greenland 2,184,307 —279.0 23.2 0.97 ) 8.85 2.09 102.7 39. 
3 Gulf of Alaska coast 716,492 —62.6 8.2 0.93 0) —3.03 —0.25 95.0 21.3 
4 Canadian Archipelago 672,413 -74.6 4.1 0.95 ) —5.33 —2.11 94.5 38.3 
5 orthern North America 1,350,129 6.1 5.8 0.52 ) 2.35 0.73 105.1 26.9 
6 orthern Eurasia 8,009,175 134 9.7 0.10 0 1.65 0.30 104.4 25. 
7 orthern India 664,169 -19.2 al 0.80 54 15.80 2.20 101.0 11.8 
8 Central India 1,352,670 9.4 0.6 0.24 51 3.72 0.36 103.7 23. 
9 Eastern Central China 657,375 78 6 0.78 14 7.33 0.77 99.6 79 
) Tibetan Plateau 881,704 7.7 A 0.67 ) —1.52 —0.90 104.2 19.7 
1 orthwestern China 215,152 —5.5 0.5 0.77 7 1.11 0.57 109.8 15.3 
2 orth China Plain 876,004 -11.3 3 0.63 52 -2.33 —0.37 103.0 19.4 
3 Eastern India Region 228,839  -—23.3 Re] 0.85 25 -9.52 —0.67 96.1 14.7 
4 orthwestern Saudi Arabia 841,763 —10.5 5 0.92 0 —1.44 -131 77.7 -14 
5. orthern Middle East 2,189,561 -—32.1 5 0.84 5 —2.80 —0.90 96.3 -8.5 
6 Southwestern Russia Region 1,772,712 —-18.1 3 0.64 15 —5.83 —0.92 96.8 6.2 
74 Aral Sea 52,299 -2.2 0. 0.76 ) 2.71 117 111.1 5.9 
8 Caspian Sea 377,761 -23.7 4.2 0.76 ) —4.37 -1.14 103.4 2.1 
9 Central Canada 802,682 -7.0 6.4 0.73 0 0.69 0.17 102.0 16.9 
20 Northern Great Plains 333,598 20.2 48 0.79 3 2.26 0.44 102.0 7.0 
21 Southern California 77,996 4.2 0.4 0.46 18 -8.31 -1.29 89.7 1.2 
22 Southern High Plains and ,105,113) -12.2 3.6 0.44 9 —5.71 —0.76 95.2 28 
eastern Texas 
23 Patagonian ice fields 461,198 —25.7 5. 0.89 0 —8.01 —0.76 97.1 -6.9 
24 Central Argentina® 530,661 -8.6 1.2 0.77 4 1.87 0.32 94.2 0.7 
25 Central and western Brazil 5,559,805 51.9 9.4 0.39 1 0.61 0.03 100.2 —5.0 
26 Eastern Brazil ,132,450  -16.7 29 0.39 1 —16.97 -1.61 97.7 -59 
27 Okavango Delta 989,692 29.5 3.5 0.55 ) —5.21 —0.61 105.3 -8.7 
28 ile headwaters 824,276 21.9 3.9 0.56 1 —3.53 —0.30 97.7 11.6 
29 Tropical western Africa 2,298,134 24.1 2 0.67 1 —0.12 —0.01 103.4 6.3 
30 orthern Congo 318,261 —7.2 1.0 0.26 0 —1.55 —0.10 99.1 TA 
31 Southeastern Africa {677,719 -12.9 2.3 0.47 ) -3.23 —0,32 95.9 -59 
32 orthern Africa 6,664,135 —11.7 2.9 0.45 1 —0.12 —0.19 106.7 -129 
33 orthern & Eastern Australia 2,504,494 19.0 28 0.32 3 4.30 0.69 104.6 —6.0 
34 orthwestern Australia 002,367 -89 1.2 0.43 0 —0.39 —0.10 99.1 —0.6 
Location; area; GRACE-based TWS trend (April 2002-March 2016) and uncertainty; coefficient of determination (r) of the fitted linear trend; percentage of the area equipped for irrigation?°; trend 
in precipitation?? (January 2002—-March 2016) after removing the seasonal cycle; annual mean precipitation (2003-2015) as a fraction of the long-term (1979-2015) annual mean??; and median 
precipitation change between the periods 1986-2005 and 2081-2100, predicted using the IPCC high-end greenhouse gas emissions scenario”® for each of the 34 study regions. 


*The TWS trend in region 24 is for April 2002—-February 2010 only. 


October 2010”. The 14-year regional trend, 7.8 + 1.6 Gt y~‘r, did not 
change appreciably after the Three Gorges Dam Reservoir was filled. 
That can be explained by both the prevalence of other dam projects 
and the greater precipitation after 2010 (971 mm yr~/; compared to 
928 mmyr! before 2010). Further, seepage from dams tends to raise 
the regional water table, which can continue for years before the system 
equilibrates*®. If precipitation trends towards an 8% increase by the 
end of this century, as predicted, then the observed TWS trend may 
persist even after the current dam building boom, although probably 
at a slower pace. 

Satellite altimetry and Landsat data indicate that the majority of lakes 
in the Tibetan Plateau have grown in water level and extent during 
the 2000s owing to a combination of elevated precipitation rates 
and increased glacier-melt flows*’, which are difficult to disentan- 
gle. From 1997 to 2001 the average annual precipitation in region 10 
was 160mm yr}, well below the 2002-2015 average of 175mm yr; 
thus, the observed increase in TWS (7.71.4 Gt yr!) may reflect 
replenishment after a prolonged dry period. Additional surface-water 
storage would have been partially offset by glacier retreat and warming- 
enhanced evaporation. The GIA may further complicate the 
partitioning of the GRACE-derived mass-change signal over the 
Tibetan Plateau*®, but some have argued that the GIA contribution is 
negligible”. The latter study? noted that interannual mass variability 
in the region during the GRACE period is large relative to the inferred 
trend**. We concur (77 = 0.67) and conclude that there is no basis for 
extrapolating the apparent TWS trend into the future. In fact, it appears 
to have reversed in 2013 (Extended Data Fig. 2). Although RCP8.5 


predicts a 20% increase in precipitation by 2100, it is probable that 
warming-induced glacier-mass losses will begin to exceed surface-water 
gains, particularly if the fraction of frozen precipitation decreases. 

Region 11 lies to the west of the city of Urumqi in northwestern 
China's Xinjiang province. During the study period, TWS depletion 
was intense: —5.5+0.5 Gt yr~! from an area of only 215,000 km?. 
Precipitation data indicate that drought was a non-factor. The glaciers 
of the Tien Shan mountain range, whose central third lies within region 
11, are melting rapidly’, but not rapidly enough to explain all of the 
mass loss. Groundwater is being withdrawn to support irrigated agri- 
culture across the province**»! and possibly to dewater coal mines™. 
However, region 11 is contained within an endorheic basin. Hence, 
the additional surface water produced by ice-melt and groundwater 
abstraction cannot flow far, yet the elevations of the five lakes within 
that basin either declined or were stable during the study period and 
GRACE did not detect substantial TWS increases in other parts of the 
basin. We conclude that region 11 is losing glacier ice and possibly 
groundwater, which ultimately become evapotranspiration, both in irri- 
gated agricultural areas to the north, south and west of the mountains, 
as well as through evaporation from the desert floor to the south”. 
Details are provided in Methods. 

The vast agricultural region surrounding Beijing (region 12) is heavily 
irrigated (52%). Previous GRACE-based studies offered a wide range 
of estimates for groundwater depletion from the North China Plain 
aquifer (see Methods for details), which is encompassed by region 12 
and supports much of that irrigation. Here we estimate a TWS change 
rate of —11.3+1.3 Gtyr7! for region 12. During the GRACE period, 


NATUR E|www.nature.com/nature 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


ANALYSIS 


Fig. 2 | Trends in TWS and supporting data maps. (Bottom to top) 
TWS trends (in centimetres per year); percentage of area equipped for 
irrigation” (%); trend in precipitation” (in millimetres per year); mean 


the total annual precipitation was steady at about 10 mm yr! above 
the 1979-2015 mean, following two dry years and a wet year during 
2001-2003. All evidence suggests that this trend is human-induced and 
likely to continue until groundwater becomes scarce or regulations are 
put in place to reduce consumption rates. 

The negative trend that extends across East India, Bangladesh, Burma 
and southern China (region 13), —23.3 + 1.9 Gtyr7!, may be explained 
by a combination of intense irrigation®> (25%) and a decrease in mon- 
soon season precipitation during the study period. The total annual 
precipitation was well above normal from 1998 to 2001, resulting in 
elevated TWS. During the GRACE period, precipitation declined at a 
rate of —10mmyr_! (—0.7% per year), and the annual accumulations 
were below average from 2009 to 2015. This is the third most heavily 
irrigated of the study regions, so TWS decline is likely to continue, 
although perhaps at a slower rate, given that rainfall should normalize 
eventually and a 15% increase in rainfall is predicted by 2100. 
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Predicted precipitation 
change (%) 


Percentage of normal 
precipitation (%) 


Precipitation trend 
(mm yr’) 


Irrigated area (%) 


Terrestrial water storage 
trend (cm yr’) 


annual precipitation (2003-2015) as a percentage of the long-term mean” 
(in per cent); IPCC-predicted change in precipitation”? (in per cent). Areas 
outside of the study regions are shaded. 


Decreasing water storage in the Middle East has been quantified using 
GRACE by previous studies**°°. Here we split the affected area into 
two regions, northwest Saudi Arabia (region 14; —10.5+1.5 Gt yr~’) 
and northern Middle East (region 15, which includes eastern Turkey, 
Syria, Iraq and Iran; —32.1 + 1.5 Gt yr~!). The declines result from a 
combination of recent drought and consequent increases in ground- 
water demand. Average precipitation during the study period was 78% 
and 96% of the 1979-2015 means in regions 14 and 15, respectively, with 
a slightly declining trend (—1% per year) in both. Although the irriga- 
tion dataset indicates that less than 1% of region 14 is irrigated, Landsat 
imagery reveals the appearance and expansion of crop irrigation over the 
past three decades, supplied by non-renewable groundwater. However, 
the Saudi Arabian government ended their domestic wheat production 
programme in market year 2014-15°”. Thus, although some farms have 
continued to operate, it is likely that the depletion rate in region 14 will 
diminish, and TWS may already be stabilizing (Extended Data Fig. 2). 
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Region 15 has experienced a more complicated recent water 
history°*”’. Turkey’s construction of 22 dams upstream on the Tigris 
and Euphrates Rivers in the last three decades has considerably 
decreased the rate of flow into Iraq and Syria. Combined with long- 
term drought, this has forced widespread over-reliance on groundwater 
for both domestic and agricultural needs, and largely explains the large 
negative TWS trend**°?. Surface and groundwater depletion is likely 
to continue in a stepwise fashion, with periods of near-stability during 
normal-to-wet years and rapid declines during drought years. 

To the north, an adjoining zone of TWS depletion (region 16; 
—18.1+1.3 Gtyr') extends from the Ukraine through western Russia 
and into Kazakhstan. As before, the root cause of this depletion is com- 
petition for scarce water resources, exacerbated by drought. Fifteen per 
cent of the area is irrigated, including fertile croplands that are vital to 
Russia. Precipitation during the study period was 97% of normal, with 
a decreasing trend of 6mm yr! (1% per year). As in region 15, surface- 
and groundwater depletion in region 16 is likely to continue as it has, 
stepwise, with substantial declines during drought years (2008, 2012 
and 2014) and lesser recoveries in normal-to-wet years. 

The water demands of regions 15 and 16 place severe pressure on the 
Aral and Caspian Seas™ (regions 17 and 18, respectively). The demise 
of the Aral Sea is well known. Our estimate of the mass change in what 
remains of it is —2.2 +0.1 Gtyr7!. Water level fluctuations in the Caspian 
Sea have previously been attributed to meteorological variability® 
and direct evaporation from the sea®!. We find that the annual dis- 
charge from the Volga River explains 60% of the variance in the annual 
mean level of the Caspian Sea compared with 18% explained by evap- 
oration from the sea. Interannual variations in Volga River discharge 
are nearly three times as large as interannual variations in evaporation, 
and the former are controlled by both precipitation changes and the 
water demands of crops, which cover 37% of the basin. Using crop-pro- 
duction data and other information, we establish that the —23.7 + 4.2 
Gt yr~! change rate of the water mass in the Caspian Sea observed by 
GRACE was caused in part by diversions and direct withdrawals of 
water from the rivers that sustain it (see Methods for details), mirroring 
the circumstances that doomed the Aral Sea. Because the Caspian Sea 
contains about 78,000 Gt of water, at the current rate it will survive for 
three more millennia, but a receding shoreline could be an issue. 

Three mass changes in Eurasia that are prominent in Fig. 1 are not 
associated with TWS at all. Crustal deformation accompanying the 
magnitude-9.1 Sumatra~Andaman earthquake of 2004 caused two of 
these mass changes, the dipole positive and negative trends in Sumatra 
and the Malay Peninsula, respectively. The magnitude-9.0 Tohoku 
earthquake of 2011 caused the negative trend in Japan®. 


North America 

Ongoing GIA processes centred near Hudson Bay, where the Laurentide 
ice sheet was thickest 20 to 95 thousand years ago, require a correction 
of the mass rates observed by GRACE of up to 5-6 cm yr7! (equivalent 
height of water), However, GIA models are imperfect and thus there 
is large uncertainty in the apparent decreasing TWS trend in central 
Canada (region 19) and some evidence that it may reflect an overcorrec- 
tion of GIA®. Nevertheless, here we estimate the rate to be —7.0+6.4 
Gt yr‘. Loss of water would be consistent with a recent study that 
concluded that Canada’s subarctic lakes are vulnerable to drying 
when snow cover declines and that recent bouts of drying may be 
unprecedented in the past 200 years®”. On the other hand, precipitation 
was 102% of normal during the GRACE period, and a 17% increase is 
predicted by the end of the century. 

The wetting trend in the northern Great Plains (region 20), 20.2 +4.8 
Gt yr“, arises from a combination of deep drought during 2001-2003, 
which depressed water levels greatly at the start of the GRACE period, fol- 
lowed by nine of the next eleven years having greater-than-average precip- 
itation, including flooding in 2010-2011°. The trend is likely to diminish 
over time, although a 7% increase in precipitation is predicted by 2100. 

A historically severe drought centred in southern California 
(region 21) that began in 2007 (ignoring a wet 2010) and consequent 
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increases in groundwater deman conspired to diminish TWS at 
arate of —4.2 0.4 Gt yr‘. Although atmospheric rivers replenished 
California’s surface waters during 2016-2017 and policy changes have 
been enacted, it is doubtful that aquifer storage will recover completely 
without large usage reductions, in part because dewatering of aquifer 
materials can cause compaction of sediments, thus reducing aquifer 
capacity irrevocably’! In the Central Valley, which provides one-third 
of the vegetables and two-thirds of the fruits and nuts grown in the US, 
annual water demands for agriculture have exceeded renewable water 
resources since the early 20th century’’. Groundwater well observations 
that extend back to 1962 suggest that each successive drought causes 
groundwater levels to step down to a new normal range without full 
recovery’, as in regions 15 and 16. Declining winter snowpack in the 
Sierra Nevada Mountains, including a 500-year low in 2015”, is a major 
concern because it is the main source of the region’s surface-water supply 
and groundwater recharge. 

Sporadic droughts”? in region 22, which encompasses parts of 
the southern High Plains and Texas, produced an apparent trend of 
—12.2+3.6 Gt yr~! during the GRACE period. In this case we forecast 
partial replenishment. Large precipitation variations caused TWS to 
seesaw between high and low (Extended Data Fig. 7). Heavy rains that led 
to flooding in parts of Texas and Oklahoma in May 2015, October 2015 
and June 2016 ended the most recent drought and reduced the linear 
rate of TWS decline during the GRACE period. On the other hand, 
withdrawals of groundwater to support irrigated agriculture that exceed 
recharge in the central and southern High Plains aquifer have persisted 
for decades” and will continue until the resource is exhausted or man- 
agement policies change. The fringes of the aquifer have already run dry 
in places, and recent estimates predict that the southern High Plains 
aquifer could be depleted within 30 years”*. Despite this situation, 
entrenched water rights are likely to preserve the status quo until the 
damage forces the hands of policymakers and stakeholders. 


69:70 


South America 

Melting of the Patagonian ice fields (region 23) has previously been 
documented using altimetry’° and GRACE”. On the basis of our 
analysis (see Methods for details), TWS loss is occurring at a rate of 
—25.7+5.1 Gt yr. In a warming world, melting of the Patagonian 
ice fields will continue until they are exhausted. 

The magnitude-8.8 Maule (Chile) earthquake that occurred on 27 
February 2010 is partly responsible for the apparent trend in Central 
Argentina”’ (region 24). A model has not yet been developed to prop- 
erly separate its effect from TWS variations after that date (Extended 
Data Fig. 3). TWS had previously been declining at a rate of —8.6 + 1.2 
Gt yr7!. The region received substantially elevated precipitation in five 
of the six years between 1999 and 2004, producing a TWS surplus at the 
start of the GRACE period. Multi-year drought began in 2009, resulting 
in a negative trend observed from April 2002 to February 2010. TWS 
appears to have begun recovering (Extended Data Fig. 3) in response to 
above-normal precipitation in 2014 and 2015 (Extended Data Fig. 7), 
and we envisage that it will return to mean wetness conditions over 
time. 

TWS increased during the GRACE period in central and western 
Brazil and its neighbours (region 25) at a rate of 51.9+9.4 Gt yr7’. 
The region received less-than-average rainfall in every year from 2001 
to 2005, followed by greater-than-average rainfall in six of the next ten 
years. Asa result, TWS recovered from the early-period drought” and 
exhibited a massive, but transitory, increasing trend which may have 
already ended (Extended Data Fig. 3). The magnitude of this trend is 
explained by both the size of the region and the intensity of the Amazon 
water cycle”, Still, we note that southern Brazil is a hotbed of dam con- 
struction”, and it is possible that the filling of reservoirs contributed to 
the upward trend. Eastern Brazil (region 26) has recently suffered from 
a major drought®, including well below normal rainfall in 2012, 2014 
and 2015, causing TWS to plunge at a mean rate of —16.7£2.9Gtyr~! 
during the GRACE period. In both cases, assuming precipitation 
rates revert towards (or oscillate around) their long-term means, the 
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observed trends should fade. In fact, owing to the recent strong El Nifio, 
2015 was the driest year in the 37-year record for region 25 (Extended 
Data Fig. 3), which may portend a reversion to average TWS. 


Africa 

Six apparent trends stand out in Africa. In southern Africa, a powerful 
wetting trend, 29.5 + 3.5 Gt yr}, is observed in the western Zambezi 
basin, the Okavango delta and areas west of the coast (region 27). 
This region experienced a remarkable change in its hydroclimate. The 
area-averaged annual rainfall was less than 970 mm in every year from 
1979 to 2005. That threshold was exceeded five times from 2006 to 
2011. A permanent climatic shift was previously speculated on the 
basis of a significant decrease in annual precipitation in 1950-1975 
and 1980-2005®". With ten years of additional hindsight, it appears 
that the region may have simply endured a prolonged drought from 
the late 1970s to the early 2000s. Thus, we attribute the GRACE-period 
trend to natural variability’. Although TWS appears to have peaked 
in 2012 (Extended Data Fig. 4), considering that the previous wet and 
dry periods lasted upwards of 25 years, it is plausible that the wetting 
trend could resume. 

An apparent trend of 21.9 + 3.9 Gt yr7! occurs along the headwa- 
ters of the White Nile and Blue Nile rivers, including lakes Tanganyika 
and Victoria (region 28). Altimetry data indicate that during the study 
period both lakes experienced minimum water levels in 2006 and that 
their annual mean levels increased by 62 mm yr~' and 40 mmyr7 on 
average, respectively; these observations are consistent with the TWS 
time series. Together, the two lake level trends equate to less than a 
quarter (4.8 Gt yr~') of the observed TWS trend. Considering that, 
rainfall would seem to be the primary driver of TWS variations, while 
management of the large lakes*’ and dam building in the northern part 
of the region® also contribute. However, rainfall is not particularly 
well correlated with either TWS or lake levels. The lack of correlation 
may be indicative of inaccuracies stemming from the sparsity of rain 
gauges in the region. The observed rainfall trend was negligible during 
the study period, but a 12% increase is predicted by 2100. The northern 
part of region 28 encompasses the Grand Ethiopian Renaissance Dam 
on the Blue Nile River at Ethiopia's northwest border with Sudan, which 
Egypt has strongly denounced because of the possibility of reduced flow 
through the Nile. Construction of the dam began in 2011 and is ongo- 
ing. Filling of the 74-km? reservoir will probably produce a temporary 
increasing TWS trend in its immediate vicinity. 

TWS has been increasing in tropical western Africa (region 29) 
at arate of 24.142.1 Gt yr’. Precipitation was 3% below normal 
in 2000-2002 and 3% above normal during the rest of the GRACE 
period. This appears to be the primary cause of TWS accumulation, 
although the possible contribution of the many dams being built in 
this part of Africa** is unknown. Because interannual variability of 
rainfall is substantial in the region®®, disregarding the dams it is likely 
that the change rate of TWS will oscillate around zero over the coming 
decades. By 2100, rainfall is predicted to decrease by 6%; hence, the 
dam construction may be timely. 

Decreasing TWS (—7.2 + 1.0 Gt yr~') in region 30, which extends 
from the coast of central Africa into the northern Congo River basin, 
seems to be caused by natural interannual variability, although it has been 
suggested that the surface runoff rate has been enhanced by deforest- 
ation®+,. Between 1999 and 2002 rainfall averaged 4% above normal, 
while it averaged 1% below normal during the rest of the GRACE 
period, including two very dry years in 2014 and 2015. The decrease in 
TWS is also consistent with the postulated negative correlation between 
TWS in the Amazon and Congo basins®*, which further implicates 
large-scale climatic oscillation as the ultimate driver®. 

The negative trend along the coast of southeastern Africa (region 31), 
—12.9+2.3 Gt yr}, reflects a recent severe drought”, which has 
caused major food shortages. Rainfall was 4% below average during 
the GRACE period, including annual accumulations that were below 
normal in five of the last eight years and barely above normal in the 
other three. Water levels in Lake Malawi, which is in the centre of the 
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region, are well correlated with regional TWS. The lake declined at a 
mean rate of 78mm yr’ during the period, accounting for 2.3 Gt yr! 
of the observed TWS trend. Hence, it is likely that the apparent trend 
is primarily caused by natural variability™, although a 6% decrease in 
rainfall is predicted during this century. 

A weak negative trend, —11.7+2.9 Gt yr}, extends across arid 
Africa north of 19° N, excluding Morocco (region 32). The coefficient 
of determination is not large at 0.45; nevertheless, precipitation during 
the GRACE period was 7% above normal, which suggests that the 
consumptive use of fossil groundwater to stimulate agriculture and 
economic development is the cause*>**°”, Three studies®!°* estimated 
recent rates of consumptive groundwater use across North Africa to be 
7.8 Gt yr~!, 15.7 Gt yr! and 4.1 Gt yr, bracketing our TWS depletion 
estimate. 


Australia 

Australia appears to be bipolar with respect to water storage during the 
GRACE era, with wetting in the east and north and drying in the north- 
west. The worst drought in over 100 years afflicted eastern Australia 
during 2001-2009**. It is likely that groundwater was more heavily 
consumed during that time to compensate for reduced availability of 
surface waters. Recovery from the drought began with heavy rains in 
2010 and transitioned to severe flooding in 2011, with so much water 
stored on the continent in 2012 that the global mean sea level temporarily 
declined®’. The shift from dry to wet conditions caused the apparent 
wetting trend in region 33, 19.0+ 2.8 Gt yr~’, but most of that water had 
already been shed by 2016 (Extended Data Fig. 4). Northern Western 
Australia received greater-than-normal rainfall during every year from 
1997 to 2001, including the two wettest years in the GPCP record in 
2000 and 2001. Thus, region 34 began 2002 near the maximum TWS 
capacity, and it gradually returned to average” (—8.9 + 1.2 Gt yr7!) with 
99% of normal precipitation during the GRACE period. It is possible 
that aquifer dewatering associated with Pilbara’s mining industry also 
contributed, but reliable data are not available to confirm and quantify 
that contribution. We can only justifiably conclude that natural varia- 
bility is the primary explanation for both Australian trends. 


Implications and discussion 

GRACE has revealed considerable changes in freshwater resources 
occurring across the globe and has allowed them to be quantified 
at regional scales, unimpeded by sparse measurements or restrictive 
data-access policies. Some of these changes are manifestations of 
human water management that, before GRACE, were known only 
anecdotally, including TWS depletion in northern India, the North 
China Plain and the Middle East (regions 7, 12 and 14-16), or not 
at all, as in northwestern China (region 11). These changes portend 
a future in which already limited water resources will become even 
more precious. Others correlate well with global warming and pre- 
dicted future precipitation changes, including worldwide ice-sheet and 
glacier melt (regions 1-4 and 23) and TWS increases in the northern 
high latitudes (regions 5-6). Apparent TWS trends in about one-third 
of the study regions represent partial cycles of longer-term interannual 
oscillations and may fade or reverse over the decades (see green dots in 
Fig. 1). Although we have made every effort to attribute the apparent 
trends properly, they will all require continued observation to better 
understand their causes and constrain their rates. 

The GRACE data provide motivation for multilateral cooperation 
among nations, states and stakeholders, including development of trans- 
boundary water-sharing agreements, to balance competing demands 
and defuse potential conflict**. Government policies that incentivize 
water conservation could help to avert a ‘tragedy of the commons’ sce- 
nario, that is, opportunistic competition for groundwater outweigh- 
ing the altruistic impulse to preserve the resource. Northern India, 
the North China Plain, the Middle East and the area surrounding the 
Caspian Sea are already on a perilous path, while California, in response 
to severe drought and alarming groundwater declines in the Central 
Valley, recently passed legislation to regulate groundwater consumption. 
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In many regions, crop irrigation on massive scales has been sup- 
ported by unsustainable rates of groundwater abstraction®*?*°". In 
the face of aquifer depletion, population growth and climate change, 
water and food security will depend upon water-saving technologies 
and improved management and governance. The success of such an 
approach in arid Israel” proves that a comprehensive water conservation 
strategy can work, and there are encouraging signs in Saudi Arabia 
(as previously discussed) and parts of India?*. Meanwhile, as China 
looks to improve living standards for its 1.38 billion residents, it will 
continue to face daunting water-management decisions, many of which 
are related to massive geoengineering and water-diversion projects that 
are likely to trigger political tensions. 

The GRACE data also call attention to regions where continued moni- 
toring will be essential for distinguishing, understanding and quantifying 
climate change impacts on the water cycle®*”* and groundwater?” 
in particular. This is important for two reasons. First, verification of 
emerging hydroclimatic trends, such as increasing northern high- 
latitude precipitation, would raise confidence in the ability of climate 
models to predict water-cycle consequences of climate change™®. 
Second, a redistribution of freshwater from dry to wet regions, as has 
been forecast, could exacerbate disparities between the water ‘haves’ 
and ‘have-nots’ and associated political instability, migration and 
conflict. Most groundwater depletion is occurring within Earth’s mid- 
latitudes, resulting in a positive drying feedback that is accelerating 
water losses and the severity of related socioeconomic issues”. 

New and future satellite remote-sensing missions that extend the 
long-term record of global hydrological observations will be essential 
for continued assessment of changing freshwater availability”’. In par- 
ticular, the GRACE Follow On mission (planned to launch in early 2018), 
while affording a small increase in spatial resolution and accuracy’, 
will enable surveillance of the trends described here and improved 
disentanglement of natural TWS variability from hydroclimatic 
change. Awareness of changing freshwater availability (for example, 
Fig. 1) is the first step towards addressing the challenges discussed 
here through improved infrastructure, water use efficiency, lifestyle 
and water-management decisions and policy. 
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Any Methods, including any statements of data availability and Nature Research 
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METHODS 


GRACE data have traditionally been processed by solving for gravity anomalies in 
terms of the Stokes coefficients (namely, Cm and Sim, with 1 denoting degree and 
m denoting order), which are the coefficients of the spherical harmonic expansion 
of Earth’s gravity field!!°!-1°°, These solutions suffer from correlated errors that 
manifest as longitudinal striping in the gravity solution, which requires tailored 
‘destriping’ and smoothing post-processing filters to remove!””, Although largely 
successful in removing errors, the post-processing also damps and smooths real 
geophysical signals'”’. Recent advances in GRACE data processing have shown that 
solving for gravity anomalies in terms of mass concentration (mascon) functions 
with carefully selected regularization results in superior localization of signals on 
an elliptical Earth*?4!°*!_ For instance, mascon solutions correlate better with 
in situ ocean-bottom pressure recorders than spherical-harmonic solutions”), 
improve the spatial resolution of mass changes in Greenland”? and were used to 
detect changes in the Atlantic meridional overturning circulation’. Currently, 
there are three publicly available GRACE mascon solutions: Jet Propulsion 
Laboratory mascons RLO5M.1 version 27)!!! (JPL-M), Center for Space Research 
mascons RLOSM?” (CSR-M) and Goddard Space Flight Center mascons version 
2.3b* (GSFC-M). JPL-M parameterizes the gravity field with 4,551 equal-area 3° 
mascon elements, whereas CSR-M and GSFC-M both parameterize the gravity 
field in terms of 1° mascon elements (~41,000 mascon elements are solved for 
in each solution). Although the implementation details of each mascon solution 
differ, we note that the JPL-M solution has the unique characteristic that each 3° 
mascon element is relatively uncorrelated with neighbouring mascon elements, 
whereas the 1° mascon elements in CSR-M and GSFC-M solutions are highly 
correlated with their neighbours. Three degrees correspond approximately to the 
‘native’ resolution of GRACE, and the lack of correlation between neighbouring 
mascon elements in the retrieval allows for a quantitative understanding of leakage 
errors when aggregating mass anomalies within a hydrological basin!!’—in fact, no 
literature yet exists on quantifying leakage errors in 1° mascon solutions. Therefore, 
in this work we use JPL-M for trend analysis and mapping; however, we use all 
three mascon solutions (JPL-M, CSR-M and GSFC-M) to derive uncertainties. 

The JPL-M solution parameterizes each monthly gravity field in terms of 4,551 
equal-area, surface spherical-cap mass-concentration functions and uses a regular- 
ization approach that implements both spatial and temporal correlations to remove 
correlated errors during the gravity inversion. A coastline resolution improve- 
ment filter is used to separate between land and ocean mass within mascons that 
span coastlines!!?, Because GRACE does not produce a reliable estimate of Earth's 
oblateness (C9 coefficient), we follow the standard protocol of using satellite laser 
ranging to provide this estimate’!*. Further, GRACE gravity-field anomalies are 
measured in the centre-of-mass reference frame of Earth and therefore need to be 
augmented with a ‘geocentre’ estimate to capture all surface-mass changes'’*, GIA 
corrections are made using the updated ICE-6G_D model, with an exception 
for Antarctica, for which we reduce the fitted rate of mass change by 9.2 Gt yr! 
on the basis of a regional model!* that potentially provides a better GIA estimate 
for Antarctica!’ Finally, corrections are made to the C2 and S; coefficients!” 
(degree-2 coefficients are related to the moments and products of inertia with 
respect to a defined reference frame and set of background force models) to fully 
remove the pole tide from the GRACE data. Jumps in the background atmosphere 
and ocean dealiasing product are corrected as well!®. 

Prior to computing the best-fit linear trend from a TWS time series, the seasonal 
cycle was removed as follows. First, missing months of data were filled by linear 
interpolation. Next, the mean monthly seasonal cycle was computed by averaging 
all Januaries, all Februaries, etc. Finally, for each month in the original, non-gap- 
filled time series, the mean for the corresponding month of the year was subtracted. 
The first step (gap filling) was necessary because, for example, the month of May 
was under-sampled in the second half of the study period, which caused the mean 
May to be biased in locations where a consistent trend existed (that is, most of the 
regions of this study). 

Trend error estimates account for both systematic and random GRACE meas- 
urement errors, as well as the systematic error of the GIA model. The GRACE 
measurement error is taken to be 1a, where a is the standard deviation between 
trend estimates obtained from JPL-M, CSR-M and GSFC-M. Given the specific 
basin boundaries used in this study, we find JPL-M to have more pronounced 
trends (both positive and negative) than CSR-M and GSFC-M, which is consistent 
with previous conclusions'””. This spread is due to a fundamental difference in the 
spectral content between the 3° mascons and 1° mascons, implying that leakage 
characteristics are different when aggregating mass anomalies over a particular 
region (somewhat counter-intuitively, the 3° mascons ‘focus’ more signals than 
the 1° sampled mascons). In essence, the ‘smooth’ nature of the 1° mascon solu- 
tions (CSR-M and GSFC-M) results in considerable damping of the signal over 
our regions of interest owing to leakage across the basin boundaries. For a more 
direct comparison of the three solutions over our regions of interest, we matched 
the spectral content of JPL-M to that of CSR-M. The regularization of the CSR 


mascon solution is based on a smoothed (using a 200-km Gaussian) representation 
ofa regularized spherical-harmonic solution’. Hence, it is expected that the final 
mascon solution will inherit some of these spectral characteristics. Therefore, we 
smooth JPL-M with a 200-km-radius Gaussian filter and compare the trend esti- 
mates of the smoothed version of JPL-M to those of CSR-M and GSFC-M. Thus, 
the agreement is substantially improved, and trends in the smoothed version of 
JPL-M are also damped similarly to CSR-M and GSFC-M (see Extended Data 
Fig. 9 for an example). Similar analysis has been performed before in a study of 
mass variations over the Caspian Sea!°. We use the standard deviation of trend 
estimates obtained from the smoothed version of JPL-M, CSR-M and GSFC-M to 
derive the GRACE measurement errors. The GIA model error is taken to be the 1a 
spread between four competing GIA models*>:!7!~!4 that implement two distinct 
loading histories, four distinct viscosity profiles and different implementations of 
physics. The uncertainty on the trend for any region is given by the root sum of 
squares combining the GIA model error (which manifests only as a trend) and the 
GRACE measurement error. 

Time series for the Aral and Caspian seas (regions 17 and 18) were calculated 
by applying a set of gain factors to the GRACE data. Gain factors redistribute 
mass within each individual mascon (at sub-mascon resolution), allowing exact 
averaging kernels to be applied to a region of interest and retrieval of accurate, 
unbiased (by leakage) mass-change values!”!!?, These particular gain factors were 
derived!” using a combination of total-column soil moisture output from the 
Noah land-surface model driven by the Global Land Data Assimilation System!” 
(which does not include sea-water variations) along with altimetry data!”° over 
the Aral and Caspian seas. 

Recent variations in Caspian Sea level have been attributed by previous studies 
to natural meteorological variability’ and direct evaporation from the sea surface". 
We tested these two theories as well as a third, agricultural water consumption. 
Flow in the Volga River, which delivers roughly 80% of the runoff to the Caspian 
Sea, is controlled by a series of eleven dams'”’”. Among other purposes, these ensure 
a steady supply of water for crop irrigation!”’. No data were available to quantify 
interannual variations in irrigation extent, intensity or volumes in the Caspian 
Sea drainage basin during the study period. Estimates of Russian annual wheat, 
maize, rice and soybean production! (in tonnes) during 1992-2015 were obtained 
from the Organisation for Economic Co-operation and Development (OECD). 
According to the irrigation dataset”’, the Volga River basin, which drains to the 
Caspian Sea, includes 3% irrigated crops and 37% rain-fed crops by area, and it 
accounts for about half of all Russian crop production. Therefore, Russian crop 
production is a fair, but imperfect, indicator of agricultural water demand in the 
basin. Yearly total production was normalized by subtracting the 24-year mean 
and dividing it by the standard deviation. Normalization was similarly performed 
on the annual time series of GPCP precipitation” over the Caspian Sea and Volga 
River drainage basins, the Volga River discharge, reanalysis-based Caspian Sea 
evaporation’” and changes in Caspian Sea level obtained from satellite altimetry™*. 
Correlation coefficients (and significance levels) between normalized Caspian Sea 
level change and its significant drivers (Extended Data Fig. 10) were 0.78 (Volga 
River discharge; P< 0.001), —0.47 (crop production; P=0.02), —0.43 (Caspian 
Sea evaporation; P= 0.04) and 0.41 (Caspian Sea drainage basin precipitation; 
P=0.05). Correlation coefficients (and significance levels) between normalized 
Volga River discharge and significant drivers were 0.52 (Volga River basin pre- 
cipitation; P= 0.01) and —0.40 (crop production; P= 0.06). Notably, the corre- 
lation between crop production and precipitation was negligible, suggesting that 
irrigation effectively mitigates the impact of drought. Interannual variations in 
Caspian Sea evaporation do indeed contribute significantly to Caspian Sea level 
changes. However, annual Volga River discharge variations are better correlated 
with annual changes in Caspian Sea level, they are larger than variations in Caspian 
Sea evaporation (standard deviation of 48 Gt versus 18 Gt, compared with a mean 
magnitude of annual Caspian Sea level change of 38 Gt) and they are controlled by 
both precipitation and rising agricultural water demand'”. We therefore conclude 
that all three factors contributed to the observed water loss (—23.7 + 4.2 Gt yr“! 
from GRACE, ignoring steric effects; -25.4 Gt yr‘ from satellite altimetry) during 
2002-2015. 

For the Gulf of Alaska coast and the Patagonian ice fields (regions 3 and 23), it 
was also necessary to increase the rates of mass loss (by 7 Gt yr! and 9 Gt yr“, 
respectively) to account for Little Ice Age GIA*!. We note that the full GIA correc- 
tions to Antarctica, the Gulf of Alaska coast and the Patagonian ice fields are not 
incorporated into Extended Data Fig. 1 and 3. 

The irrigated area fractions (Table 1) were computed by area-weighted averaging 
of the individual pixel values of irrigation intensity? (%) within each study region. 
Precipitation trends (mm yr~') were computed on the basis of monthly data’, as 
above for TWS, except that there were no gaps to fill. Precipitation trends (% per 
year) and percentages of normal precipitation were computed using the 1979-2015 
annual mean precipitation totals for each region. Predicted precipitation changes 
were computed as area-weighted averages from the IPCC dataset” over the study 
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regions. The precipitation maps in Fig. 2 were computed as above, but on a pixel- 
by-pixel basis. 

The explanation for the mass-loss trend in northwestern China (region 11), 
—5.5+0.5 Gt yr_|, is complex. Drought was not a factor, given that precipitation 
was 10% above normal and stable during the period. Two recent studies!*°'3! 
estimated the rate of glacier loss over the entire Tien Shan mountain range to be 
—5.4+2.9 Gtyr! and —7.5+3.4 Gt yr | based on Ice, Cloud and Land Elevation 
Satellite (ICESat) observations from 2003 to 2009. These estimates are somewhat 
smaller than our GRACE-based estimate of TWS decline in region 11 during that 
period (—8.3 + 0.8 Gt yr_'), despite region 11 encompassing less than half of the 
area of glacier melt!*°. Hence, we conjecture that an additional catalyst for mass 
loss must exist. Xinjiang province is one of the world’s largest producers of coal, 
having an estimated 2.2 trillion tons of reserves'**. Reported rates of coal removal 
and burning are more than an order of magnitude smaller than the GRACE- 
observed mass loss!**, but mining involves dewatering of the aquifers that the 
mines intersect. Consequent groundwater depletion in the area is possible” but 
unconfirmed. Adding to the complexity, region 11 lies within a larger endorheic 
basin, which means that water pumped from the ground or melting from glaciers 
will remain as surface water, become groundwater recharge or evapotranspire, 
as opposed to flowing to the ocean. However, on the basis of satellite altimetry 
data, the elevations of the five lakes within the surrounding endorheic basin did 
not increase during the study period. All either declined or did not change sig- 
nificantly. The two lowlands into which region 11 drains (one to the northwest, 
one to the southeast) have GRACE-based trends of 0.3 Gt yr~! and —0.6 Gt yr! 
(both insignificant). Ultimately, evapotranspiration must account for the water 
lost from region 11. The average annual precipitation in region 11 is 194mm yr}, 
making it the fourth-driest of the 32 study regions. The endorheic basin is exten- 
sively irrigated, including 7% of region 11, and irrigation intensity is likely rising 
in support of Xinjiang province’s population growth (from 18.2 million in 2000 
to 21.8 million in 2010)*!. Massive amounts of surface water from Lake Bosten 
and the Kongque River (both to the southeast of region 11) are transferred via 
aqueducts southwards to the Tarim River to support farming in the arid plains; 
however, the Tarim River runs dry before reaching its natural terminus, Lop Nor 
lake*’. To summarize, the Tien Shan mountain glaciers in region 11 are shrinking 
because of global warming. Groundwater may be declining owing to agricultural 
withdrawals or mining operations, but the latter is unconfirmed. Because region 
11 lies within an endorheic basin, neither glacier melt nor groundwater pumping 
can alone explain the observed TWS depletion. The corollary is that the resulting 
additions to surface water are balanced by desert- and irrigation-enhanced evap- 
otranspiration. 

As noted in the main text, although previous GRACE-based studies of the North 
China Plain (region 12) agreed that groundwater depletion associated with intense 
irrigation was the cause of the trend, they offered a wide range of estimates of the 
TWS or groundwater trend. Specifically, these estimates were —8.3 Gt yr! overa 
370,000-km? area!*, —35 Gt yr! over a 2,086,000-km? area!*4, —2.33 Gt yr! over 
a 370,000-km? area’? and —14.09 Gt yr7! over a 1,500,000-km? area!*°, compared 
with our estimate of —11.3 Gt yr~ lover an 876,004-km? area. 

Data availability. Specific sources of data used in this study were the following. 
The primary GRACE TWS dataset is JPL Mascon RLO5M.1 version 2, accessed 
on 3 February 2017 from https://grace.jpl.nasa.gov/data/get-data/jpl_global_ 
mascons/. Additional GRACE TWS datasets used to estimate errors were CSR 
RLO5 Mascon version 1, accessed on 20 September 2017 from http://www2.csr. 
utexas.edu/grace/RL05_mascons.html, and GSFC Mascon version 2.3b, accessed 
on 5 October 2017 from https://neptune.gsfc.nasa.gov/gngphys/index.php?sec- 
tion=413. Primary GIA data used in this study were the ICE-6GD model, accessed 
on 1 December 2017 from http://www.atmosp.physics.utoronto.ca/~peltier/data. 
php, and the IJ05_R2 GIA correction for Antarctica, accessed on 3 February 2018 
from http://onlinelibrary.wiley.com/doi/10.1002/jgrb.50208/full. Additional GIA 
data used to compute the GIA model error included ICE-6G_ANU_D, accessed on 
3 February 2018 from http://onlinelibrary.wiley.com/doi/10.1002/2017JB014930/ 
full, the A et al. (2013)!2! GIA model, accessed on 16 December 2013 from ftp:// 
podaac-ftp.jpl.nasa.gov/allData/tellus/L3/pgr/, and the Paulson et al. (2007)! 
GIA model, accessed on 3 February 2018 from https://academic.oup.com/gji/ 
article/171/2/497/2018541. Atmosphere and ocean dealiasing product jump 
corrections were accessed on 13 June 2016 from ftp://podaac-ftp.jpl.nasa.gov/all- 
Data/grace/docs/. Precipitation data from GPCP version 2.3 were accessed on 23 
September 2016 from https://www.esrl.noaa.gov/psd/data/gridded/data.gpcp.html. 
Global rain-fed, irrigated and paddy croplands version-1 data were accessed on 12 
September 2016 from http://ftp-earth.bu.edu/public/friedl/GRIPCmap/. Global 
reservoir/lake elevation TPJO.2.3 data were accessed on 29 July 2016 from https:// 
ipad.fas.usda.gov/cropexplorer/global_reservoir/. Precipitation change data pre- 
dicted by the IPCC 5th Assessment Report (RCP8.5) were accessed on 1 September 
2016 from https://www.ipcc.ch/pdf/assessment-report/ar5/wg1/WG1AR5_ 
AnnexI_FINAL.pdf. Russian crop production data were accessed on 16 August 
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2017 from https://data.oecd.org/agroutput/crop-production.htm. Latent heat flux 
(evapotranspiration) data for the Caspian Sea and its drainage basin were extracted 
from MERRA-2 version M2TMNXLND_5.12.4, accessed on 19 September 2017 
from https://disc.sci.gsfc.nasa.gov/datasets/M2TMNXLND_5.12.4/summary. 
Volga River discharge observations are restricted from public access, but a time 
series of normalized annual discharge values was provided to M.R. by V. Khan of 
the Hydrometeorological Research Center of the Russian Federation. 

The JPL RLOSM GRACE solution used in this study is identical to that available 
from the NASA/JPL GRACE Tellus website, with the exception that we imple- 
mented a different GIA model, a correction to the pole tide and corrections to 
the background atmosphere and ocean dealiasing model. These adjustments are 
available from D.N.W. upon request. Data analysed to create Extended Data Fig. 9 
are available from D.N.R. upon request. Excel spreadsheets containing the data 
and calculations used to create Table 1 and Extended Data Fig. 10 are available 
from M.R. upon request. 

Code availability. MATLAB scripts were used to prepare GRACE-based TWS 
time series for the study regions, including GIA adjustments, C2; and $2) coefficient 
replacements, and corrections for jumps in the atmosphere and ocean dealiasing 
products. These are available from D.N.W. upon reasonable request. TWS time 
series analyses, including trend estimation and 7° computation, were performed 
within Excel spreadsheets, which are available from M.R. upon reasonable request. 
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Extended Data Fig. 1 | Non-seasonal TWS anomalies—global regions. 
a-f, Time series of monthly TWS anomalies (departures from the period 
mean) from GRACE, after removing the mean seasonal cycle, averaged 
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Extended Data Fig. 2 | Non-seasonal TWS anomalies—Eurasia. a-l, As in Extended Data Fig. 1, for regions 7-18. 
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Extended Data Fig. 3 | Non-seasonal TWS anomalies—North and South America. a—h, As in Extended Data Fig. 1, for regions 19-26. 
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Extended Data Fig. 4 | Non-seasonal TWS anomalies—Africa and Australia. a—h, As in Extended Data Fig. 1, for regions 27-34. 
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Extended Data Fig. 6 | Annual precipitation totals—Eurasia. a—n, As in Extended Data Fig. 5, for regions 7-18 and the full drainage basins of the Aral 


and Caspian seas. 


2009 


2019 


b Central India 


d Tibetan Plateau 


i North China Plain 


800 . ® 


h NW Saudi Arabia 


n Caspian Sea Drainage Basin 


e 
500 e e e 


1979 1989 1999 2009 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


2019 


a Central Canada 


Cc Southern California 


e Patagonian Ice Fields 


Central and Western Brazil 


e 
é e Cee ® e e e ® e 
2000 |°%e a a 
° 
e : ° ee 
1800 | e 


1600 


1979 1989 1999 2009 


2019 


Northern Great Plains 


Southern High Plains & Eastern Texas 


f Central Argentina 


600 


1979 1989 1999 2009 


Extended Data Fig. 7 | Annual precipitation totals—North and South America. a-h, As in Extended Data Fig. 5, for regions 19-26. 
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Extended Data Fig. 8 | Annual precipitation totals—Africa and Australia. a—h, As in Extended Data Fig. 5, for regions 27-34. 
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Simulating the vibrational quantum 
dynamics of molecules using photonics 


Chris Sparrow!”, Enrique Martin-Lopez’, Nicola Maraviglia!, Alex Neville, Christopher Harrold!, Jacques Carolan‘, 
Yogesh N. Joglekar®, Toshikazu Hashimoto®, Nobuyuki Matsuda’, Jeremy L. O’Brien, David P. Tew® & Anthony Laing!* 


Advances in control techniques for vibrational quantum states in molecules present new challenges for modelling such 
systems, which could be amenable to quantum simulation methods. Here, by exploiting a natural mapping between 
vibrations in molecules and photons in waveguides, we demonstrate a reprogrammable photonic chip as a versatile 
simulation platform for a range of quantum dynamic behaviour in different molecules. We begin by simulating the time 
evolution of vibrational excitations in the harmonic approximation for several four-atom molecules, including H)CS, 
SO3, HNCO, HFHF, N, and P4. We then simulate coherent and dephased energy transport in the simplest model of the 
peptide bond in proteins—N-methylacetamide—and simulate thermal relaxation and the effect of anharmonicities in 
H,O. Finally, we use multi-photon statistics with a feedback control algorithm to iteratively identify quantum states that 
increase a particular dissociation pathway of NH3. These methods point to powerful new simulation tools for molecular 


quantum dynamics and the field of femtochemistry. 


Early electronic computers exploited analogies with acoustic, thermal 
or mechanical phenomena, such as capacitance for spring stiffness, 
to simulate a range of practically relevant physical systems. Whereas 
modern digital simulations have become versatile foundational tools 
in science and engineering, all classical computers are fundamentally 
inefficient at tackling exponentially complex microscopic behaviour 
such as the quantum dynamics of molecules’. A proposed solution 
is to engineer quantum mechanical components into devices that are 
then inherently capable of simulating quantum systems*~°. Here, we 
demonstrate how integrated quantum photonics can be used to develop 
simulation methods for molecular quantum dynamics, by building on 
the analogies between optical modes in waveguides and vibrational 
modes in molecules and between single photons and quantized vibra- 
tional excitations. 

Advances in the control of ultrafast molecular dynamics have 
revealed the importance of quantum interference among vibrational 
modes in behaviour such as bond-selective chemistry”. In applying 
optimal control theory to a harmonic model of chained atoms’, it has 
been shown in principle how a control field could drive the dynamics of 
quantum interference between vibrational modes® to excite local bonds. 
However, laboratory demonstrations of selective bond dissociation 
required adaptive feedback control to put the principles into practice?. 
Further control over vibrational wavepackets has enabled selective dis- 
sociation governed by a single quantum of vibrational energy!°, manip- 
ulation of individual molecules at ambient conditions!', preparations of 
coherent superpositions on sub-femtosecond timescales’, and single 
vibrational states of ultracold molecules!*. Molecular dynamics are now 
observable on their ultrafast intrinsic timescale'*. 

The prospect of more sophisticated control with quantum states of 
light and for larger molecules increases the challenge of simulating 
dynamic behaviour. Light-matter interaction with squeezed states has 
been demonstrated experimentally in several contexts (see, for exam- 
ple, ref. !°); enhanced spectroscopy and the control of molecules with 


multi-mode, multi-photon states has been shown theoretically (see, 
for example, ref. 1°), with techniques for pulse shaping of quantum 
states of light also demonstrated (see, for example, ref. 17), Evolving 
a multi-excitation state across many vibrational modes is computa- 
tionally inefficient even for the basic model in which normal modes 
are described as independent quantum harmonic oscillators. Owing to 
their bosonic nature, the probability amplitudes for input-output tran- 
sitions among the modes are determined by matrix permanents, the 
calculation of which is generally extremely complex'®. More detailed 
molecular models, for example, with anharmonic corrections to the 
potentials, are also likely to be computationally complex. 

Quantum algorithms for the efficient simulation of Hamiltonian 
dynamics*!” have been a strong motivator for digital quantum com- 
puters, such as those that use trapped ions”. Promising digital algo- 
rithms for simulating reaction dynamics”! and obtaining thermal 
rate constants~” have been presented that harness the exponential 
quantum speed-up. Yet, achieving fault tolerance” and the high 
logical-gate counts‘ that enable these applications is extremely challeng- 
ing. Ansatz-based methods, such as the variational approach for solv- 
ing the eigenvalue problem?° , have reduced demands, as demonstrated 
recently with superconducting qubits”, but the difficulties associated 
with applying such an approach to Hamiltonian dynamics have yet to 
be overcome. Analogue quantum simulations’, in which a quantum 
system of interest can be mapped directly onto a quantum technological 
platform, may enable practical advantages in the nearer-term. 

Progress in photonic quantum technologies over the past decade 
has seen the introduction of on-chip processing of photonic quantum 
information’’~’, full reprogrammability for linear optical circuitry”, 
and the integration of photon generation*)” and detection*’. Solid- 
state single-photon sources* and high-efficiency detectors*® have 
recently been demonstrated as a solution to achieving large numbers 
of photons. Ultimately, basic methods to correct for photon loss are 
likely to be required before photonic quantum simulations outperform 
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classical algorithms? © but the demands on error correction for special- 
ized quantum simulators could be much lower than those for universal 
digital quantum simulators*”. Here, our focus is on establishing pro- 
grammable linear optical circuitry as a core capability for simulating 
the vibrational dynamics of the atoms within molecules. 


Simulation procedure 

Diagonalizing the Hessian matrix of a molecule in mass-weighted coor- 
dinates provides its vibrational spectrum and normal modes, which 
define a Hamiltonian of independent quantum harmonic oscillators: 


=> hua} a; 
i 


where fi is the reduced Planck constant, w; is the angular frequency of 
the ith mode, anda a and a; are the bosonic creation and annihilation 
operators of the ith mode. The spatial localization of vibrational energy 
is important for understanding many molecular phenomena, such as 
energy transport and dissociation. We therefore consider a basis trans- 
formation 


t Lt 
4; 7 a Uji 
k 


where U' isa unitary matrix, to a set of modes localized around a single 
atomic site or chemical bond. Dynamics in the localized basis can then 
be simulated via the model Hamiltonian 


where 


and the overbar denotes complex conjugation. 

This general model can be simulated directly for m vibrational modes 
of any given molecule with a linear optical chip that can be pro- 
grammed to implement any unitary operation over m modes. 
Reconfiguring such a device to implement the unitary transfer matrix 
U(t;) =exp(—iH't,/h) for a series of time steps {t;} enables simulation 
of the Hamiltonian A, on any initial multi-mode vibrational state via 
its mapping to a multi-mode optical input state. Here, we use a 
silica-on-silicon integrated photonic chip that is fully programmable 
over six waveguides via 30 thermo-optic phase shifters*° to perform 
molecular simulations of up to six-mode vibrational systems. We 
simulate initial states of up to four vibrational quanta, with states of up 
to four single photons, produced from spontaneous parametric down- 
conversion sources. Photons are detected with single-photon counting 
modules. The number and pattern of photons collected at the output 
of the optical modes for each circuit configuration are governed by the 
probabilities for the molecule to be found in the corresponding vibra- 
tional states at the corresponding time step. 


Vibrational dynamics of four-atom molecules 

Thioformaldehyde (H2CS), a key molecule for spectroscopic experi- 
ments, is shown in Fig. 1a with its normal-mode spectrum. The six 
localized vibrational modes of H,CS comprise two CH stretch modes, 
two CH bend modes, a CS stretch mode and a wagging mode, which 
are mapped to our photonic chip from the normal-mode basis, as 
depicted conceptually in Fig. 1b. We initialized the simulation for the 
state |) o Lous» Louse) + H”|2cusp 2crs2) (with small squeezing 
parameter 4), which consists of multiple excitations superposed over 
the two CH stretch modes (‘CHs1’ and ‘CHs2’), by injecting the two- 
mode squeezed vacuum state that was produced by the spontaneous 
parametric downconversion source, into the two waveguides that cor- 
respond to the CH stretch modes. Photons were collected over a series 
of circuit configurations that correspond to time steps of the H,CS 
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local-basis Hamiltonian. In Fig. 1c we display the experimentally sim- 
ulated evolution of the probabilities for excitations to be found in only 
the CH stretch modes, in only the CH bend modes and shared between 
these stretch and bend modes, for the two-excitation (upper panel) and 
four-excitation subspace (lower panel). 

Dynamics in the two-excitation subspace involve both excitations 
oscillating between stretch and bend modes via the intermediate state 
in which one excitation is in each of the subspaces. The L! distance 


1 


between the results for an experimentally simulated time step (p) and 
the ideal distribution (q) is averaged over all time steps to give 
D=0.06+0.03. In the four-excitation subspace, in which both of 
the stretch modes are initially doubly occupied, the experimentally 
simulated evolutions of probabilities for both stretch modes to remain 
doubly occupied, for both bend modes to become doubly occupied, 
and for combinations of one doubly occupied stretch mode and 
one doubly occupied bend mode are shown. The apparent damping of 
the oscillatory behaviour between these probabilities is attributable to 
the combinatorially growing space of multiple excitations available to 
the evolving state. The distance between the experimentally simulated 
and ideal evolutions for the full four-photon distributions, averaged 
over all time steps, is D =0.16+0.07. The full distributions for these 
and all subsequent experiments are provided in Supplementary 
Information. 

Because time is a programmable parameter in our simulator, we are 
able to study molecular vibrations whose evolution involves different 
timescales, such as the local CH stretch mode in H2CS, which is a 
superposition of normal modes with lower and higher frequencies. The 
probability for a single excitation to remain localized in a CH stretch 
mode was simulated on two timescales that differ by an order of mag- 
nitude. Heralded single photons were injected into the mode that cor- 
responds to a local CH stretch. The circuit was programmed to 
implement sets of unitary transformations that correspond to a short 
(30 fs) high-resolution window and that correspond to a longer (300 fs) 
low-resolution window, the behaviour of which can be observed by 
averaging over the high-resolution windows. In Fig. 1d we display data 
for these simulations, which exhibit both higher- and lower-frequency 
oscillations. Averaging over both evolutions gives a mean distance of 
D=0.014+0.006. 

Our six-mode simulator can explore the full space of vibrational 
dynamics for a general molecule of up to four atoms, as we demon- 
strate for P4, SO3, HNCO, HFHF and N,. In Fig. le-i we show the time 
evolution of a single excitation initially prepared in a local stretch 
mode. The change in the occupation probability to a second, spectrally 
overlapped (coupled) local mode is plotted. We observe dynamics with 
varying characteristic times governed by the vibrational spectra of the 
molecules. Owing to its geometry and bonding structure, P, has the 
longest-period oscillations between opposing stretches, with SO3 
showing similar stretch-mode coupling behaviour on shorter times- 
cales. By contrast, HNCO and HFHF display faster dynamics with 
increased mode coupling between hydrogen-bond stretches and 
bends. In Fig. 1i we show the dynamics of both a single excitation and 
two excitations initially prepared in stretch modes of N4. The addi- 
tional structure in the vibrational spectrum and the introduction of 
multi-photon quantum interference results in a more complex time 
dependence of the detection probabilities. The average L' distance over 
all of these experiments is D = 0.022 +0.007. 


Decoherence and energy transfer in NMA 

The flow of vibrational energy in molecules is a fundamental process 
for chemical reaction rates and functionality in biomolecules*®. The 
vibrational quantum dynamics of a molecule within an environment 
can be described by the interplay of coherent unitary evolution and 
incoherent dephasing that results from random fluctuations of the 
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Fig. 1 | Simulating the vibrational dynamics of four-atom molecules in 
the harmonic approximation. a, Schematic evolution of a localized CH 
stretch mode (diagonal black arrow) in H,CS, with its composition from 
normal modes plotted below. b, The evolution of the normal modes 

(exp( — iHt/h)), shown schematically in the center of the top layer, is 
unitarily mapped (U; and U/) to a set of local vibrational modes, shown 
schematically at the ends of the top layer. This transformation is then 
mapped to a time-dependent unitary transfer matrix (U(t); middle layer). 
Simulations of photonic states under this evolution are then implemented 
by a reconfigurable photonic chip (bottom layer). c, An initial 
superposition of two and four excitations evolving in the localized stretch 
modes is simulated by injecting a two-mode squeezed vacuum state into 
the corresponding optical modes and collecting photon statistics for the 


vibrational frequencies—a process referred to as spectral diffusion. 
N-methylacetamide (NMA) is the simplest molecular model (Fig. 2a) 
of the peptide bond in proteins, where quantum coherence may 
have a role in energy transfer*?. In this section, we simulate a model 
for intramolecular energy transport in NMA in the presence of 
dephasing. 
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sequence of simulated time steps. Top and bottom panels show results for 
the two- and four-excitation subspaces, respectively (see insets). 

d, Simulations on two timescales of the evolving probability for a single 
excitation to remain in a CH stretch mode. Blue squares represent the 
mean probability over a 30-fs window (as per left panel). e-i, The 
simulated evolution of a single excitation in P, (e), SO3 (g), HNCO (f) and 
HFHF (h) between a local stretch mode (black) and another coupled local 
mode (blue). The local modes are represented diagrammatically alongside 
the spectral intensities of the normal modes involved. For N, (i), results 
are also shown for the evolution of two excitations. All data are plotted 
together with ideal theoretical curves; error bars displaying 1 s.d. from 
Poissonian statistics are very small. 


We consider a subspace that spans six backbone vibrational 
modes, which support a basis of approximately localized vibrational 
modes, including two rocking modes (curved arrows in Fig. 2) 
and two stretch modes (straight arrows in Fig. 2). Uniform 
dephasing between all modes is achieved by a time-dependent 
statistical averaging over the set of experiments with transfer 
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Fig. 2 | Quantum energy transfer and dephasing in NMA. a, A six-mode 
vibrational subspace of the NMA molecule is considered, with the spectral 
components of three localized modes colour-coded as per the arrows 
in b. b, Experimental simulation results for the probability of a single 
excitation (black points) that is initially in a local rocking mode (black 
arrow) at one end of the molecule and its transfer (blue and grey points) 
to local modes at the opposite end (blue and grey arrows) when subject to 
a dephasing channel with T, =0.53 ps. c, Experimental simulation results 
for the evolution of a two-excitation state (black points) that is initially 
in separate local modes (black arrows) and its probability (blue points) 


matrices U(t;,k)= U,Z(k)Ufexp( - iH"t,/h) , where Z(k) are 
Heisenberg—Wey] matrices (defined in Supplementary Information) 
labelled by k and the average is taken over k at each time step. 

Using a single photon, we simulated the probability for a single 
excitation initialized in a local rocking mode at one end of the mole- 
cule to be transferred to two localized modes (a rocking mode and a CC 
stretch mode) at the opposite end of the molecule. The experimental 
results shown in Fig. 2b show dynamics that are initially oscillatory, 
with vibrational energy transfer between the rocking modes at either 
end of the molecule via an intermediate CC stretch mode. The increas- 
ing effect of the suppression of coherence from dephasing results in 
evolution towards a steady state. Peak probabilities for energy to be 
localized at either end of the molecule are higher under quantum coher- 
ent dynamics than under purely ballistic classical dynamics. We used 
a T> time constant of coherence decay of 0.53 ps, but any time constant 
can be simulated by changing only the post-processing of data. 

Simulating multiple excitations allows us to investigate the interplay 
of dephasing and quantum interference for multi-excitation energy 
transport. By injecting one photon into the waveguide that corresponds 
to the rocking mode and another photon into the waveguide that cor- 
responds to the CC stretch mode, which are each localized at opposite 
ends of the NMA molecule (black arrows in Fig. 2c), we simulated 
the change in the probability for this state and for the state in which 
both excitations ‘bunch in an NH stretch mode (double blue arrows in 
Fig. 2c). The results in Fig. 2c show more complex oscillatory transfer 
between these bunched and anti-bunched states, which again tends 
towards a steady state. However, after full dephasing has occurred, the 
probability for two excitations to be bunched in the NH stretch mode 
is twice as high for excitations that behave as indistinguishable bosonic 
particles than for excitations that behave as distinguishable or classical 
particles (such as two excitations that pass through the molecule at 
different times). 

For a given molecule, the probability that no bunching occurs (multi- 
ple excitations not localized around the same bond) generally decreases 
as the number of excitations increases“’. In Fig. 2d the probability for 
the subspace of no-bunching events is simulated for two and three 
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Time (fs) 


of being found bunched in the NH stretch mode (blue arrows) under the 
same dephasing channel. Solid lines represent theory. The dashed blue 
line plots a theoretical curve for distinguishable (or classical) excitations 
to be found bunched in the NH stretch mode. d, Experimental simulation 
results for the total probability of measuring a fully anti-bunched state 

of two excitations with the same intial state as for c (black points with 
solid black theory curve) and of measuring a fully anti-bunched state of 
three excitations initialized in the modes shown in b (black points with 
dot-dashed theory curve). All error bars represent 1-s.d. estimates from 
Poissonian statistics. 


excitations under fully coherent dynamics. The initial state for the 
two-excitation evolution is the same as in the previous example; the 
initial state for the three-excitation evolution comprises an excitation in 
each of the local modes shown in Fig. 2b. The average distances across 
all single-, two- and three-excitation distributions in these examples are 
0.017 + 0.005, 0.05 + 0.01 and 0.14+0.07, respectively. 


Vibrational relaxation in liquid water 

We now consider extensions to the idealized model of uncoupled har- 
monic oscillators to account for more realistic situations, including 
energy dissipation and anharmonic potentials. We choose models for 
H,0 to demonstrate our techniques. 

For a molecule that interacts with its environment, vibrational energy 
is exchanged via intra- and intermolecular coupling to other degrees of 
freedom, eventually leading to thermalization. This process is known 
as vibrational relaxation, and its pathways for H2O remain an area 
of investigation*!*”. Here we simulate the relaxation of H2O via an 
amplitude-damping model (Fig. 3a). 

We consider a Lindblad master equation, which results in a set of 
time-dependent Kraus operators that can be simulated via an ensem- 
ble of transfer matrices. This evolution cannot be described as a con- 
vex sum of unitary evolutions as in the previous section; however, the 
transfer matrices can be realized within a unitary matrix of twice the 
size, via unitary dilation**. Because H,O has three vibrational modes, 
its three-dimensional (non-unitary) transfer matrices can be realized 
within a six-dimensional unitary matrix and implemented on our six- 
mode chip (Fig. 3b). We used experimentally measured relaxation 
times {I7} for liquid water at room temperature“! in the model. 

We simulated the thermalization of an excitation in a local OH 
stretch mode via the symmetric bend normal mode to its ground state 
of no excitations. In Fig. 3c we show the probability of measuring the 
excitation in the two local stretch modes (left panel) and the symmetric 
bend mode (right panel). Oscillations between the two high-energy 
stretch modes decay as the population is transferred via the lower- 
energy bend mode to the ground state. The average L! distance in these 
experiments was D = 0.024+0.007. 
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Fig. 3 | Vibrational relaxation and anharmonic evolution for H,O. 
a, Energy-level diagram for single-excitation harmonic levels and the 
ground state of H2O (right) along with the spectral components of the two 
local OH stretch modes (black and grey) and the symmetric bend normal 
mode (blue) (left). I’\,2,3 are the characteristic decay rates obtained from 
experiments. b, The open-system dynamics of vibrational relaxation, 
described by the Lindblad equation 6 = £(p) where p is the vibrational 
state, can be simulated by statistically averaging the evolution under a set 
of linear operators implemented via unitary dilation. The Krauss operators 
in the localized basis, K(t), are dilated into a unitary matrix by increasing 
the dimension (blocked-out parts of the matrix). c, Experimental results 
for the simulated evolution of the probability for a single excitation that is 
initially in one OH stretch mode (black points) to be found in the other 
stretch mode (grey points) and in the symmetric bend (blue points) under 


Anharmonic potentials in HO 

Potential energy surfaces of real molecules are anharmonic, and we now 
consider simulations in this regime, depicted in Fig. 3d. In addition to 
the second derivative in the Taylor expansion of the potential energy 
surface, as per the harmonic approximation, we now include all third 
derivatives and the semi-diagonal quartic derivatives. Applying vibra- 
tional perturbation theory yields a new Hamiltonian: 


H,= Aras FPR (aia;+ aja; +2a/aFa,a, 


i<j 


where H is the harmonic Hamiltonian and x, are the perturbation- 
theory coefficients. 

Implementing this Hamiltonian requires interactions between 
photons—a key challenge in quantum information processing. 
Demonstrations of enhanced single-photon interactions have required, 
for example, an artificial Kerr medium using superconducting 
circuits“’, fibre coupling a single atom and a microresonator®, or 
coupling to a single quantum dot in a micropillar cavity**. Importantly, 
the interactions that are required to implement perturbative models 
such as H, can be weaker than the fully entangling operations and 
controlled x phase gates that are used for universal quantum comput- 
ing, with a potentially lower demand for reprogrammable nonlinear 
optics. 
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relaxation dynamics. Solid lines are theoretical curves. d, Spectrum of two 
excitations in bunched (black) and anti-bunched (blue) local stretch 
modes for a harmonic (dashed) and anharmonic (solid) model. e, The 
anharmonic evolution is characterized by anharmonic potentials for single 
oscillators (top inset, where R is the nuclear distance and V(R) is the 
potential energy at R) and cross-mode coupling between oscillators 
(bottom inset). These are implemented via a measurement-induced 
nonlinearity using an ancillary photon and modes. f, Experimental results 
for the simulated evolution of the probability for two excitations that are 
initially bunched in local stretch modes to be found in the anti-bunched 
state (left) and the bunched state (right) under both models (dashed, 
harmonic; solid, anharmonic). All error bars represent 1-s.d. estimates 
from Poissonian statistics. 


Here, instead, we demonstrate an approach based on measure- 
ment-induced nonlinearities, which are in principle scalable for 
all-linear-optical quantum computing but involve a large overhead. It 
is possible to implement a conditional 7 phase shift on a two-photon 
Fock state using an ancillary photon and additional optical modes*’ 
Using newly derived nonlinear phase-shift gates, we are able to imple- 
ment arbitrary phase shifts between the zero-, one- and two-photon 
Fock states of an optical mode. 

We simulate and compare harmonic and anharmonic models of 
vibration for HO, restricting to the subspace of stretch modes. Two 
photons injected together into the top mode of the chip serves to sim- 
ulate two excitations initialized in a superposition of the eigenstates of 
the harmonic model that correspond to a local OH stretch. As shown 
in Fig. 3e, when simulating the anharmonic model, this input state is 
understood as the same superposition of the new energy eigenstates of 
H,, while a third photon injected into the third optical mode serves as 
the ancillary system. 

In Fig. 3f we show the results of simulating the probabilities for these 
two vibrational excitations to remain bunched or to anti-bunch, under 
harmonic (H ) and anharmonic (A, ) models. The difference in the 
detection patterns between the two models is a function of their differ- 
ent spectra (Fig. 3d). The probability of detecting a single excitation in 
each of the modes (anti-bunched; Fig. 3f, left panel) acquires a simple 

frequency shift for the anharmonic evolution that corresponds to the 
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adjusted energy levels (Fig. 3d, top panel). By contrast, the probabilities 
for the state to remain doubly occupied display markedly different 
dynamics between the harmonic and anharmonic cases (Fig. 3f, right 
panel). This is a result of the three vibrational eigenstates no longer 
being equally spaced in energy (Fig. 3d, bottom panel), which intro- 
duces new frequencies in the evolution. For this set of experiments, the 
average distances between the ideal and experimental distributions for 
the harmonic and anharmonic cases are 0.02 + 0.01 and 0.06 + 0.02, 
respectively. 


Quantum simulation with adaptive feedback 

Adaptive feedback control (AFC) is a practical laboratory technique 
for finding optimal control fields for molecules®. AFC naturally incor- 
porates laboratory constraints to design control fields that would 
not be found either analytically or through numerical simulation. 
Nevertheless, models idealized for quantum simulation could help to 
identify new possibilities for molecular control, could enable their com- 
parison over a large number of molecules and could include quantum 
control fields. 

Our goal is to use our simulator with an adaptive feedback loop from 
its quantum measurement statistics to design initial quantum states 
for a molecule that maximally achieve a particular task over a period 
of evolution. Our example molecule is ammonia (NH3), a prototype 
for studying dissociation, including vibrationally mediated pathways, 
in which the states of its products (NH, + H) depend on the prior 
vibrational state in the ground electronic state’®. 

Our model (Fig. 4b) simulates the preparation of a vibrational 
state in the electronic ground state of the molecule. We then obtain 
the vibrational state that results from an electronic excitation by pro- 
jecting the vibrational modes of the ground state onto the vibrational 
modes of the excited state. We approximate this projection by a unitary 
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transformation between the modes Uc x; however, this transformation 
can in general be achieved via single-mode squeezing, displacement 
and linear optical transformations*®. The evolution of the vibrational 
state of the electronically excited molecule is simulated under the har- 
monic Hamiltonian for the normal modes. By measuring the evolved 
state in a localized basis we identify three local NH stretch modes. 

The aim of this simulation, depicted in Fig. 4c, is to let an AFC algo- 
rithm find the initial states of two vibrational excitations (in the mole- 
cule in the electronic ground state) that result in a maximal total 
probability of bunching in any of the three NH stretch modes (of the 
electronically excited molecule) over the first 10 fs of evolution, which 
we associate with a preferred dissociation pathway, while suppressing 
other bunched events, which we associate with other pathways. The 
algorithm begins with a trial state of two photons that simulates two 
excitations superposed randomly over five of the normal vibrational 
modes. State preparation, which is parameterized by the vector 8, is 
optimized iteratively by programming the simulator to implement 
U(6, t;) = U,exp( — iHt,/h) Ud,U(8), where Ugg relates to the trans- 
formation between the ground- and excited-state normal modes and 
U, relates to the transformation between the excited-state normal and 
local modes. 

An example experimental trial is shown in Fig. 4d. We used a Nelder- 
Mead simplex method to minimize the cost function 


C=—a)> wAp, €[-1,1] 


where Ap; is the difference between the probability of bunching in 
the NH stretch modes and the remaining modes at time step i, w; are 
weighting factors and a is a normalization factor. The final value in 
Fig. 4d is C= —0.771, starting from a random initial state with 
C=-+0.337. We repeated this experiment with six random initial states; 
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Fig. 4 | AFC algorithm for a dissociation pathway in NH3. a, The spectral 
decomposition of an NH stretch mode in the electronic excited state of 
NH. b, A two-excitation vibrational state, parameterized by 8, is 
initialized in the ground-state vibrational modes (ey (0, 0))) of NH3. The 


electronic state (|x(0, t))) is excited and the localization of vibrational 
energy is measured over time. These measurements are used to feedback 
to the state preparation to increase energy localization in NH stretch 
modes, promoting a particular dissociation pathway for this molecule 

c, This scenario is simulated via a parameterized unitary for state preparation 
U(@), a transformation between the ground-state and excited-state modes 


Uda evolution under the excited-state modes (exp( — iH t/h)) and 
measurement in a localized basis via U,. The resulting photon statistics are 
fed back through an AFC algorithm to set 6 for the next iteration (after 
calculating the cost function C). d, The left panel displays an example set 
of experimental results that show the full distributions for bunching in the 
NH stretch modes (red), bunching in the remaining three localized modes 
(blue) and detection in anti-bunched patterns (yellow) for five time steps 
at iteration numbers 1 (bottom), 175 (middle) and 399 (top). The right 
panel shows —C measured at every iteration. 
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all resulted in similar final values of the cost function, with a mean of 
C =—0.845. 


Discussion 

We have introduced photonics as a platform for simulating the vibra- 
tional quantum dynamics of molecules within the harmonic, pertur- 
batively anharmonic and Linblad models, with a photonic chip playing 
the part of a programmable molecule. Scaling up and extending these 
techniques to more involved Hamiltonians with highly anharmonic 
atomic potentials and electronic degrees of freedom, and to realize an 
advantage over classical simulation techniques*®, presents important 
and interesting research directions. 

Device errors that must be addressed include inevitable flaws in 
circuit fabrication and operation, photon distinguishability and pho- 
ton loss. Although the precision that is required in the setting of any 
individual circuit parameter increases with dimension”, linear optical 
elements with extinction of more than 60 dB have been demonstrated”. 
Indistinguishability, or visibilities, between independent photons have 
been reported at 95% for on-chip sources*” and at more than 90% for 
time bins of a solid-state photon source**. Although ultralow-loss 
integrated circuitry has been demonstrated”, photon loss is a funda- 
mental error in photonics; basic methods that alleviate some of this 
error would provide substantial improvements in rates for the class of 
experiments demonstrated here. The development of programmable 
nonlinear optics at the quantum level is a key functionality for quantum 
technologies and remains an important challenge for the field. With 
modest progress in these areas, our approach could yield an early class 
of practical quantum simulations that operate beyond current classical 
limits. 


Data availability 

The data shown in the plots and that support the findings of this study 
are available from the Research Data Repository of the University of 
Bristol at https://doi.org/10.5523/bris.2ymwd4m50qkt26mtrhpli3d1i . 
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Pyramidal cell regulation of interneuron 
survival sculpts cortical networks 


Fong Kuan Wong!?, Kinga Bercsenyi!*, Varun Sreenivasan!?, Adrian Portalés!?, Marian Fernandez-Otero!? & Oscar Marin! 


2% 


Complex neuronal circuitries such as those found in the mammalian cerebral cortex have evolved as balanced networks 
of excitatory and inhibitory neurons. Although the establishment of appropriate numbers of these cells is essential for 
brain function and behaviour, our understanding of this fundamental process is limited. Here we show that the survival 
of interneurons in mice depends on the activity of pyramidal cells in a critical window of postnatal development, during 
which excitatory synaptic input to individual interneurons predicts their survival or death. Pyramidal cells regulate 
interneuron survival through the negative modulation of PTEN signalling, which effectively drives interneuron cell 
death during this period. Our findings indicate that activity-dependent mechanisms dynamically adjust the number of 
inhibitory cells in nascent local cortical circuits, ultimately establishing the appropriate proportions of excitatory and 


inhibitory neurons in the cerebral cortex. 


In the adult neocortex, approximately one-sixth of neurons are inhibi- 
tory y-aminobutyric acid-containing (GABAergic) interneurons’, and 
this ratio is relatively stable across cortical regions and species regard- 
less of the total number of neurons*®. The cellular balance between 
excitation and inhibition is critical for brain function and is likely to 
be disrupted in a number of neuropsychiatric conditions’®. However, 
the mechanisms that regulate the establishment of appropriate numbers 
of excitatory and inhibitory neurons in the cerebral cortex remain 
largely unknown. 

Programmed cell death, also known as apoptosis, is an essential 
mechanism that sculpts the central and peripheral nervous systems 
during development'®-'”. The death of developing neurons is mediated 
by an evolutionarily conserved signalling pathway that involves the 
pro-apoptotic BCL2 family members BAX and BAK’?. Previous studies 
have shown that both cortical pyramidal cells and GABAergic interneu- 
rons undergo extensive cell death during postnatal development'*'*, 
which suggests that apoptosis may contribute to the establishment of 
balanced networks of excitatory and inhibitory neurons in the cerebral 
cortex. However, the temporal relationship and interdependency of the 
programmed cell death periods for both populations of neurons have 
not been explored in detail. 


Concatenated waves of neuronal death 

To determine the developmental sequence that establishes the final 
ratio of excitatory and inhibitory neurons in the cerebral cortex, we 
estimated the absolute numbers and relative proportions of pyrami- 
dal cells and GABAergic interneurons at different postnatal stages of 
development using stereological methods in mouse strains in which 
specific classes of neurons are irreversibly labelled. We chose this 
method to estimate programmed cell death over the direct quantifica- 
tion of dying cells because classical apoptotic markers such as cleaved 
caspase-3 have non-apoptotic roles in neurons!® and are expressed only 
transiently (Extended Data Fig. 1a, b). We crossed Nex” and Nkx2- 
1-Cre mice with appropriate reporter strains (see Methods) to identify 
pyramidal cells (Nex@@'+;Fucci2) and GABAergic interneurons (Nkx2- 
1-Cre;RCL“"), respectively. Expression of Cre under the control of the 
Neurod6 locus in Nex“’* mice labels all cortical excitatory neurons 


with the exception of Cajal—Retzius cells!”. In Nkx2-1-Cre mice, Cre 
specifically labels interneurons derived from the medial ganglionic 
eminence (MGE) and preoptic area (POA), including the two largest 
groups of cortical GABAergic interneurons, parvalbumin (PV) and 
somatostatin (SST) expressing cells)’. 

We observed that the total number of excitatory neurons in the neo- 
cortex decreases (by about 12%) between postnatal day (P)2 and P5, 
and then remains stable into adulthood (Fig. 1a, b, e). The reduction in 
excitatory neurons affects all layers of the neocortex and not only sub- 
plate cells (Extended Data Fig. 1c-e), which are known to undergo pro- 
grammed cell death during this period”. By contrast, we found that the 
number of interneurons is steady until P5, drops extensively between 
P5 and P10 (by about 30%), and then remains constant into adult- 
hood (Fig. 1c-e). Interneuron cell loss follows the normal maturation 
sequence of MGE and POA interneurons”, with deep layer interneu- 
rons adjusting their numbers ahead of superficial layer interneurons 
(Fig. 1f). These results revealed that consecutive waves of programmed 
cell death adjust the final ratio of excitatory and inhibitory neurons in 
the developing cerebral cortex. 


Interneuron activity predicts cell death 

Our results indicated that the adjustment of interneuron numbers is 
preceded by a wave of pyramidal cell death, which suggest that these 
two processes might be directly linked. As previous work has shown 
that neuronal activity and apoptosis rates are inversely correlated in the 
developing brain*!~*?, we hypothesized that pyramidal cells may impact 
interneuron survival by increasing the activity of the cells to which they 
connect. We tested this idea by monitoring the activity of MGE and 
POA interneurons in the superficial layers of the barrel cortex (S1BF) 
during the period of interneuron cell death. To this end, we generated 
mice expressing the fluorescent reporter tdTomato and the genetically 
encoded calcium sensor GCaMP6s in MGE and POA interneurons 
(Nkx2-1-Cre;RCL'41/6C™M?6s mice)** and performed long-term Ca?* 
imaging in the same interneurons from layer 2/3 in S1BF of awake, 
head-restrained pups (Fig. 2a). To select the most appropriate time for 
these experiments, we estimated interneuron cell death in S1BF during 
postnatal development and found comparable dynamics to the rest of 
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Fig. 1 | Consecutive waves of programmed cell death of pyramidal cells 
and interneurons in the early postnatal cortex. a, c, Coronal sections 
through the primary somatosensory cortex (S1) of Nex“"’*;Fucci2 (a) and 
Nkx2-1-Cre;RCL'4!"#"° (¢) mice during postnatal development. b, Total 
number of pyramidal cells in the entire neocortex of Nex°"’+;Fucci2 mice 
(ANOVA, F=4.17, *P =0.02; n=4 (P2 and P5), 3 (P7) and 5 (P10 and 
P21) mice). d, Total number of MGE and POA interneurons in the entire 
neocortex of Nkx2-1-Cre;RCL'47"° mice (ANOVA, F = 26.80, *P=0.01; 
n= 4 mice per age). e, Temporal percentage variation in pyramidal cells 
and MGE and POA interneurons. f, Total number of MGE and POA 
interneurons in superficial (L1-L4) and deep layers (L5 and L6) of the 
neocortex (two-way ANOVA, Finteraction = 1.01, *P = 0.03 and **P= 0.002; 
n=3 animals per age). Data are shown as mean + s.e.m. Scale bars, 
100m. 


the neocortex (Extended Data Fig. 2). For layer 2/3, we observed the 
most prominent decrease in the number of MGE and POA interneu- 
rons between P7 and P8 (Fig. 2b). 

We first established our ability to identify surviving interneurons at 
both times. As expected, the majority of tdTomato* interneurons in a 
region of interest (ROI) were present in the same location the following 
day (Fig. 2c). However, we also observed that a fraction of tdTomatot 
interneurons disappeared between P7 and P8 (Fig. 2c). As MGE and 
POA interneurons have ceased migration by the end of the first post- 
natal week”, these observations are consistent with the idea that the 
cells that disappeared between P7 and P8 had undergone apoptosis. 

We next investigated whether neural activity at P7 in interneurons 
that die by P8 was different from activity in cells that lived past P8. 
Analysis of calcium event rates (events per min) at P7 indicated that 
interneurons that died at P8 exhibited significantly fewer calcium 
events than neurons that lived past P8 (Fig. 2d, e). We next analysed 
whether P7 event rates could discriminate between neurons that died 
at P8 and neurons that lived beyond this day. Receiver-operating 
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Fig. 2 | Interneuron activity levels predict cell death. a, Schematic of 
experimental design. b, Total number of MGE and POA interneurons in 
layer 2/3 SIBF of Nkx2-1-Cre;RCL'4™#" mice (n = 3 animals per age). 
Data are shown as mean + s.e.m. ¢, ROI imaged at P7 (left) and P8 (right); 
individual neurons numbered. d, Raster plots showing the occurrence 

of calcium events at P7 for the four numbered neurons in c. e, Box plots 
illustrating event rates for P7 interneurons that lived past P8 (magenta) 
and interneurons that died by P8. In box plots, the central mark indicates 
the median, and the bottom and top edges of the box indicate the 25th and 
75th percentiles, respectively. Two-sided Mann-Whitney test, P=0.03; 
n= 18 for cells that died at P8 and 153 for cells that lived beyond P8, 

from three pups. f, ROC analysis showing the ability of P7 event rates to 
discriminate between cells that died by P8 and cells that lived past P8. 
AUC (area under the curve) = 0.65, *P=0.025. Scale bar, 15,1m. 


characteristic (ROC) analysis revealed that the event rate at P7 per- 
formed significantly better than chance at discriminating between 
these two populations (Fig. 2f). These results suggested that inter- 
neurons with relatively low levels of activity immediately before the 
period of interneuron cell death have an increased probability of 
undergoing apoptosis”*””. 


Pyramidal cells regulate interneuron death 

The previous experiments led us to hypothesize that interneurons 
receiving abundant or particularly strong inputs during the period of 
interneuron cell death would have increased chances of survival. As 
PV* and SST* interneurons receive most of their inputs from local 
pyramidal cells during the first postnatal week”’, we reasoned that 
modification of the activity of cortical excitatory neurons during the 
period of interneuron cell death would influence interneuron sur- 
vival. To test this idea, we transiently modified the activity of pyrami- 
dal cells using a chemogenetic approach based on Designer Receptors 
Exclusively Activated by Designer Drugs (DREADDs) that induce 
neuronal activation or inhibition”. We injected the primary somato- 
sensory cortex (S1) of PO Nex@rel+ (pyramidal cell-specific) mice with 
an adeno-associated virus (AAV) encoding mutant G-protein-coupled 
receptors that induce neuronal activation (hM3Dq) or inhibition 
(hM4Di) following administration of the pharmacologically inert mol- 
ecule clozapine-N-oxide (CNO) (Fig. 3a). We then injected pups with 
CNO twice daily between P5 and P8, and examined the distribution of 
interneurons at P21 (Fig. 3a). We found that the increase in pyramidal 
cell activity during the period of interneuron cell death prevented this 
process and led to a significant increase in the density of PV* and SST* 
interneurons at P21 compared to control mice (Fig. 3b-d). This effect 
was not due to activity-dependent changes in the expression of PV or 
SST or in the density of pyramidal cells (Extended Data Fig. 3a, b). We 
also found that dampening the activity of pyramidal cells decreased 
the density of PV+ and SST* interneurons at P21 compared to control 
mice, which indicates that interneuron cell death can be bidirectionally 
modulated by modifying the activity of pyramidal cells (Fig. 3b-d). 
In both experiments, changes in the density of interneurons were 
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Fig. 3 | Bidirectional modulation of pyramidal cell activity regulates the 
extent of interneuron cell death. a, Top, schematic of experimental design. 
Bottom, mCherry expression at P21 following AAV injection at PO. AM, 
anteromedial thalamic nucleus; AUDy, ventral auditory cortex; Cg, cingulate 
cortex; CPu, caudate-putamen; GP, globus pallidus; Pir, piriform cortex; Rt, 
reticular nucleus. b, c, Coronal sections through S1BF from P21 Nex@r!+ 
mice injected with hM3Dq-mCherry or hM4Di-mCherry viruses followed by 
vehicle or CNO treatment. d, Quantification of the density of PV* and SST* 
cells at P21. Two-tailed Student's unpaired t-test: for hM3Dq, ***P=0.0002, 
** P— 0,003; for hM4Di, **P = 0.006 (PV*), **P=0.004 (SST*); for hM3Dq, 
n=7 and 9 mice for —-CNO PV and SST‘, respectively; 6 and 7 mice for 
+CNO PV* and SST*, respectively; for hM4Di, n=7 mice for —-CNO and 

5 mice for +CNO for both PV* and SST*. Data are shown as mean +s.e.m. 
Scale bars, 500j1m (a) and 100j1m (b). 


homogenously distributed across layers containing PV~ and SST* 
interneurons (Extended Data Fig. 3c, d). CNO administration did not 
cause a redistribution of interneurons from neocortical areas adjacent 
to the injection site (Extended Data Fig. 3e, f). Instead, we observed a 
prominent increase in the density of cleaved caspase-3-positive cells 
following inhibition of the activity of pyramidal cells during the normal 
period of interneuron cell death (Extended Data Fig. 4a—c). Notably, 
control experiments revealed that CNO did not modify the density of 
PV* or SST* interneurons in pups not infected with AAV-expressing 
DREADDs (Extended Data Fig. 4d, e). Similarly, CNO administration 
between P10 and P13 in mice injected with hM3Dq or hM4Di induced 
no significant changes in the density of PV* and SST* interneurons at 
P21 (Extended Data Fig. 5). Together, these results demonstrate that 
pyramidal cell activity is an essential regulator of interneuron survival 
during the normal period of interneuron cell death. 


Interneurons match pyramidal cell numbers 
The previous experiments suggest that pyramidal cells ‘rescue’ appro- 
priate numbers of cortical interneurons from programmed cell death 
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Fig. 4 | Survival of pyramidal cells rescues interneuron cell death. 

a, b, Coronal sections through S1BF from P30 Bak*! +;Bax! and 
Nex“!+;Bak~/—;Bax!!! mice (a), and Bak*!-; Bax!!! and Nkx2-1- 
Cre;Bak~'~;Bax!""" mice (b). c, Quantification of the density of PV* and 
SST* cells in pyramidal cell-specific Bax/Bak mutant mice, MGE and POA 
interneuron-specific Bax/Bak mutant mice and their respective controls 

at P30. Two-tailed Student’s unpaired t-test: for Nex"’*, **P=0.001, 
+k D — 0).0002; for Nkx-2-1-Cre, *P=0.04, ***P=0.00004; n =4 mice for 
Nex“!+; Bak ~;Bax!! (PV*) and 5 mice for all other groups. Data are 
shown as mean +s.e.m. Scale bar, 100\1m. 


through an activity-dependent mechanism. Drawing on this idea, we 
reasoned that modification of the number of pyramidal cells before 
the period of interneuron cell death should also influence the number 
of surviving interneurons. To test this hypothesis, we generated con- 
ditional mice in which pyramidal cells specifically lack Bak and Bax, 
whose combined function is critical for apoptosis*”. As expected, we 
observed that the number of excitatory neurons in the cerebral cortex 
of Nex°!+;Bak~/~;Bax!" mutant mice did not decline between P2 
and P21 (Extended Data Fig. 6). Consequently, Nex©®!+;Bak7/~;Baxlf! 
mutant mice had approximately 12% more pyramidal cells than control 
mice (Fig. 1b and Extended Data Fig. 6). 

We next quantified PV* and SST* interneurons in S1 of con- 
trol and Nex°’*+;Bak~/~;Bax!' mutant mice at P21. The density 
of both PV* and SST* interneurons was roughly 30% higher in 
Nex!*;Bak~ Bax!" mutant mice than in controls (Fig. 4a, c), which 
suggests that interneuron cell death is suppressed when pyramidal 
cell death is prevented. This increase was homogenously distributed 
across layers containing PV* and SST* interneurons (Extended Data 
Fig. 7a), and was also observed in other neocortical areas (Extended 
Data Fig. 7c, d). To evaluate whether the increase in the number of Pvt 
and SST* interneurons represented the entire population of cells that 
should normally have died through programmed cell death, we gener- 
ated conditional mice lacking Bax and Bak in MGE and POA interneu- 
rons. We found that the density of PV+ and SST* interneurons was also 
approximately 30% higher in Nkx2-1-Cre;Bak ~~ Bax! mutant mice 
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Fig. 5 | Pyramidal cell activity controls interneuron cell survival 
through PTEN inhibition. a, P-AKT, AKT and actin protein levels in 

the neocortex. Friedman test, P= 0.001; *P=0.03 for P5 versus P8, 
*P—().0101 for P5 versus P9 and ***P=0 for P5 versus P7; n= 6 mice 

for each age. b, Coronal section through layer 2/3 S1BF from Nkx2-1- 
Cre;RCL'?!™4'0 mouse at P8. Some interneurons have much higher PTEN 
levels (filled arrowheads) than most (open arrowheads). c, Coronal 
sections through layer 2/3 S1BF from Nkx2-1-Cre;RCL'4%™'9 mice at P5, 
P7, P8 and P10. PTEN expression is shown as a custom look up table (LUT, 
blue-white-yellow) in tdTomato-masked cells. d, Cumulative distribution 
of mean PTEN intensity in layer 2/3 MGE interneurons. Kruskal-Wallis, 
**% P— 1.7 x 10->4; n= 223 cells (P5), 184 cells (P7), 394 cells (P8) and 395 
cells (P10) from three animals at each age. e, f, Coronal sections through 


than in controls (Fig. 4b, c). Indeed, the fold changes in the density of 
PV+ and SST* interneurons were identical for Next; Bak~~;Baxl!!! 
and Nkx2-1-Cre;Bak~;Bax!" mutant mice (Extended Data Fig. 7b). 
These results revealed that prevention of pyramidal cell death is suffi- 
cient to abolish programmed cell death in MGE and POA interneurons, 
which reinforces the idea that excitatory input from pyramidal cells 
onto interneurons during early postnatal development is critical for 


establishing the appropriate ratio of excitatory and inhibitory cells in 
the cerebral cortex. 


PTEN regulates interneuron cell death 

We next investigated the molecular mechanisms through which 
pyramidal cell activity prevents programmed cell death in cortical 
interneurons. In the developing nervous system, the serine-threonine 
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kinase AKT is a critical mediator of neuronal survival?! that is 
antagonized by the activity of the phosphatase and tensin homo- 
logue PTEN*?*+, Consistent with this, we observed that the relative 
levels of activated AKT (P-AKT/AKT ratio) increased transiently in 
the neocortex during the period of interneuron cell death (Fig. 5a). 
Notably, PTEN levels are very heterogeneous among MGE and POA 
interneurons during the same period (Fig. 5b). PTEN levels were 
transiently elevated in sparse interneurons in deep and superfi- 
cial layers of S1, and this increase was concurrent with the peak of 
interneuron cell death in these layers (Fig. 5c, d and Extended Data 
Fig. 8a, b). These observations led us to hypothesize that high PTEN 
levels during this period may drive interneurons towards cell death, 
and that pyramidal cells might influence this process by regulating 
PTEN in interneurons. 
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To test this hypothesis, we generated mice in which we conditionally 
deleted Pten from postmitotic MGE interneurons****. We observed that 
Lhx6-Cre;Pten™" mutant mice had abnormally large jaws and reduced 
body weight compared to their littermates by P16, probably owing to 
the embryonic expression of Lhx6 in the first branchial arch?’, which 
prevented their analysis at later developmental stages. We nevertheless 
found that Pten conditional mutants had a significantly higher density 
of PV* and SST* interneurons in S1 than control mice (Fig. 5e, f, g 
and Extended Data Fig. 8c, d), without any difference in relative dis- 
tribution across layers (Extended Data Fig. 8e). As Lhx6-Cre drives 
recombination in endothelial cells in addition to MGE interneurons*°, 
we examined whether a change in the organization of neocortical 
blood vessels might contribute to increased survival of interneurons 
in conditional Pten mutants. We found that the density of blood vessels 
was higher in conditional Pten mutants than in controls (Extended 
Data Fig. 8c, d). However, this change did not affect the density of 
pyramidal cells (Extended Data Fig. 8c, d), which rules out an indirect 
effect of blood vessels on interneuron survival through an increase in 
pyramidal cell density. To rule out a direct effect of blood vessels on 
interneuron survival, we carried out a second series of experiments 
using acute pharmacological inhibition of PTEN. We injected the PTEN 
inhibitor bpV(pic) systemically at P7 and P8 into wild-type mice and 
analysed blood vessel density in S1 at P10 (Extended Data Fig. 9b, c). 
Mice injected with the PTEN inhibitor did not exhibit increased blood 
vessel coverage (Extended Data Fig. 9b, c). By contrast, transient PTEN 
inhibition during the period of interneuron cell death increased the 
density of MGE interneurons compared to control mice (Extended 
Data Fig. 9a, d, e). Mice injected with the PTEN inhibitor outside the 
normal window of interneuron programmed cell death showed similar 
densities of PV* and SST* interneurons to controls (Extended Data 
Fig. 9f-h). These results revealed that PTEN is likely to be required 
cell-autonomously for interneuron apoptosis during the normal period 
of interneuron cell death. 

Finally, we examined whether pyramidal cell activity influences 
the survival of interneurons by non-cell-autonomously regulating the 
expression of PTEN in these cells during the period of interneuron cell 
death. To this end, we carried out DREADDs experiments similar to 
those that led to an increased number of cortical interneurons following 
transient activation of pyramidal cells between P5 and P8 (Fig. 3), but 
here we analysed PTEN levels in cortical interneurons at P8 (Fig. 5h). 
We found that PTEN levels were significantly decreased in GABAergic 
interneurons following the activation of pyramidal cells (Fig. 5i, j). 
These results strongly suggest that pyramidal cells influence the 
normal programmed cell death of interneurons through the activity- 
dependent inhibition of PTEN, which tips the balance between survival 
and apoptotic signalling pathways in developing interneurons. 


Discussion 

Our results suggest that programmed cell death in interneurons has 
evolved as a mechanism responsible for adjusting the final ratio of 
excitatory and inhibitory neurons in the cerebral cortex, a critical mile- 
stone in the assembly of cortical circuits**. Although synaptic mech- 
anisms are known to stabilize excitatory—inhibitory ratios in cortical 
circuits*?"', this effectively requires that the relative proportions of 
pyramidal cells and interneurons are within certain parameters’? . 
Considering the disproportionate expansion of neocortical areas during 
human evolution*™*, it is tempting to speculate that the dependency 
of interneuron survival on pyramidal cells provided an evolutionary 
advantage for the preservation of appropriate ratios of excitatory and 
inhibitory cells during the rapid increase in pyramidal cell numbers in 
the primate lineage. 

Our work indicates that interneuron cell death is non-cell- 
autonomously regulated by pyramidal cells, which seem to be able to 
rescue connected interneurons from their intrinsically determined cell 
death" by inhibiting the activity of PTEN during a critical window in 
postnatal development. It is worth noting that a sizable proportion of 
individuals with autism spectrum disorders (ASD) and macrocephaly 
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carry deleterious mutations in the PTEN gene*”®. Our observations 
indicate that loss of PTEN function is sufficient to disrupt programmed 
interneuron cell death, which may in turn alter the cellular balance of 
excitation and inhibition in the cerebral cortex. This mechanism may 
contribute to deregulation of cortical information processing and social 
dysfunction in individuals with ASD who carry PTEN mutations. 
The rate of apoptosis in pyramidal cells varies among functionally 
different neocortical areas and even across layers within the same cor- 
tical area”. This suggests that the proposed mechanism might sculpt 
the heterogeneous patterns of interneuron distribution that exist across 
the cerebral cortex”®. Consequently, the regulation of programmed cell 
death in interneurons by pyramidal cells is likely to contribute to the 
cytoarchitectonical specialization of cortical areas. 
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Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0139-6. 
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METHODS 


Animals. All experiments were performed following the guidelines of King’s 
College London Biological Service Unit and in accordance with the European 
Community Council Directive of November 24, 1986 (86/609/EEC). Animal work 
was carried out under licence from the UK Home Office in accordance with the 
Animals (Scientific Procedures) Act 1986. Both male and female mice were used 
indiscriminately throughout the study. For stereology on pyramidal cells, Nex©’+ 
mice!” (provided by K. A. Nave) were crossed with Fucci2 mice*! (RCL™748/+, 
provided by R. L. Mort). For stereology on MGE and POA interneurons, Nkx2- 
1-Cre mice”® (JAX008661) were crossed with RCL'41!mto/tdTomato mice?’ (JAX 
007909). For in vivo calcium imaging experiments, Nkx2-1 -Cre;RCL'tomato/td Tomato 
mice were crossed with RCL&C*™?6/+ mice”? (JAX024106) to generate Nkx2-1- 
Cre;RCL'@?mato/GCaMPos nice, All DREADDs experiments were conducted in mice 
obtained by crossing Nex’ mice with CD1 mice. To prevent pyramidal cells 
from undergoing programmed cell death, Bak~/~;Bax!"" mice** (JAX006329) 
were crossed with Nex" mice and the F1 inter-crossed to obtain Bakt!+;Bax! 
and Nex“’+;Bak~/-;Bax" mutants. For MGE interneurons, a similar breeding 
scheme used Nkx2-1-Cre mice instead. For the quantification of pyramidal cells 
in Nex°"!*+;Bak~—;Bax!"!' mice, these mutants were crossed with Fucci2 mice to 
obtain Nex?'+;Bak ~~ ;Bax’RCL™24’+ mutants. Pten™+ mice” (JAX006440) 
were crossed with Lhx6-Cre mice*? (provided by N. Kessaris) to generate Lhx6- 
Cre;Pten”’* mice, and F1 inter-crosses led to the production of Lhx6-Cre;Pten!! 
mutant mice. Mice were obtained from The Jackson Laboratories unless otherwise 
stated. 

Histology. Mice were anaesthetized with an overdose of sodium pentobarbital 
and transcardially perfused with saline followed by 4% paraformaldehyde (PFA). 
Brains from pups younger than P6 were post-fixed for 4 h while brains from mice 
older than P6 were post-fixed for 2 h at 4°C. Brains were sectioned either on a 
sliding microtome at 30 or 40 um as previously described>? or on a vibratome at 
40 or 60um. All primary and secondary antibodies were diluted in PBS containing 
0.25% Triton X-100 and 2% BSA. The following antibodies were used: goat anti- 
CTGF (1:200, Santa Cruz), rabbit anti cleaved-caspase-3 (1:200, Cell Signalling), 
rabbit anti-dsRed (1:500, Clontech), goat anti-mCherry (1:500, Antibodies-online), 
rabbit anti-GABA (1:2,000, Millipore), mouse anti-GABA (1:500, Sigma), mouse 
anti-NeuN (1:500, Millipore) mouse anti-parvalbumin (1:1,000, Swant), rabbit 
anti-parvalbumin (1:5,000, Swant), rat anti-somatostatin (1:300, Millipore) and 
rabbit anti-PTEN (1:500, Abcam). We used Alexa Fluor-conjugated secondary 
antibodies (Invitrogen). For biotin amplification, sections were incubated with 
biotinylated secondary antibody (1:200, Vector labs), followed by Alexa Fluor- 
conjugated Streptavidin (1:200, Vector labs). Blood vessels were stained with 
Isolectin-B4-FITC or Isolectin-B4-Dylight 594 (1:500, Vector labs). 

Stereology. The total numbers of pyramidal neurons and MGE interneurons in the 
cerebral cortex were estimated using the optical dissector method”: 


LCF 


~ hx asf x ssf 


where Q7 is the total number of cells counted, t the mean section thickness, 
h the height of the optical dissector (17 1m for pyramidal neuron stereology, 18 j1m 
for MGE stereology), adjusting for the guard zones (11m) above and below the 
dissector, asf stands for the area sampling fraction and ssf stands for the section 
sampling fraction (frequency of sampling). An ApoTome (Zeiss) equipped with 
a motorized stage and colour camera was connected to a computer with Stereo 
Investigator software (MBF Biosciences). The boundaries of the neocortex were 
first defined with a 2.5 objective (Zeiss). 

For pyramidal neurons, sampling was performed with a 63 x objective (Zeiss, 
numerical aperture (NA) 1.4). The counting frame was set at 15 x 15m? and the 
grid size at 400 x 400,1m’. The sampling parameters were as follows: asf = 0.0014, 
ssf=0.25 (P2); 0.125 (all other ages). For MGE interneurons, sampling was per- 
formed with a 40 x objective (Zeiss, NA 1.3). For the entire cortex stereology anal- 
ysis, the counting frame was set at 125 x 125m? and the grid size at 900 x 900 jum? 
(P2), 1,200 x 1,200 1m? (all other ages). The sampling parameters were as follows: 
asf= 0.019 (P2); 0.011 (all other ages), ssf = 0.125. For the upper/lower cortical 
layer stereology analysis, the counting frame was set at 125 x 125m? and the 
grid size at 900 x 900,1m’. The sampling parameters were as follows: asf = 0.019, 
ssf=0.125. For the stereological analysis of the barrel field, the counting frame was 
set at 200 x 200m? and the grid size at 350 x 350|1m2. The sampling parameters 
were as follows: asf= 0.3265, ssf=0.125. 

In vivo imaging. P6 mice were anesthetized with 2% isoflurane and held in a nose- 
clamp. Isoflurane concentration during surgery was maintained between 1-2% 
and the body temperature was maintained at 37 °C by a heating pad. The scalp was 
cleaned with betadine and cut open to expose the skull covering the dorsal neocortex. 
The periosteal tissue, surrounding the skull, was gently scraped with a scalpel. The 
skull was cleaned with betadine and Ringer’s solution. A circular custom-made 


metal head-post (Luigs and Neumann) was attached over the left hemisphere with 
cyanoacrylate glue (Henkel). A thin protective layer of glue was applied over the 
skull. The glue was allowed to dry for 10 min. Dental cement (Paladur) was used 
to reinforce the attachment of the head-post to the skull. The mouse was injected 
with buprenorphine (2 1l/g of a 50 j1g/ml solution) and returned to its home cage. 

At P7, the animal was anesthetized and head-restrained in a custom-made 
head holder. A 3-mm craniotomy was opened over the posterior-lateral neocor- 
tex. This craniotomy encompassed the primary somatosensory cortex (S1). Care 
was taken not to damage the dura mater. A circular coverslip (3 mm diameter, 
Harvard Apparatus) was placed over the craniotomy and its edges were sealed 
with cyano-acrylate glue and reinforced with dental cement. Following surgery, 
dexamethasone (5 1l/g of a 38j1g/ml solution) was injected subcutaneously. The 
animal was allowed to recover for at least 2 h in its home cage, following which we 
commenced imaging at P7. 

Imaging sessions lasted for 40-60 min and we imaged the same field of view for 
consecutive days in three mice. TdTomato and GCaMP6s™ were excited using a 
Ti-Sapphire laser (Coherent Chameleon) tuned to \= 930 nm. The emitted pho- 
tons were collected by two GaAsP detectors through a 20x objective (Olympus, 
1.0 NA). The field of view (FOV) measured 385 x 385 1m (512 x 512 pixels). The 
scan speed was set to 30 Hz and image sequences were obtained in sweeps of 1 min 
(1800 images per channel per min). The average excitation power was between 40 
and 50 mW, and this was kept constant over all imaging days. 

To correct for motion artefacts, image registration was carried out using cus- 
tom-written spatial cross-correlation methods on the tdTomato channel. In brief, 
on every 1-min sweep, a part of the tdTomato image sequence, where the animal 
was not moving, was chosen and 20 frames were averaged to give a non-moving 
reference image. Every frame of the tdTomato image sequence was spatially 
cross-correlated to this reference image and offset along the x- and y-axes to match 
the cross-correlation peak. The offsets obtained for each tdTomato frame were 
applied to the corresponding GCaMPé6s frame. 

Calcium imaging analysis. Circular ROIs (diameter 20 pixels) were manually 
drawn around tdTomato-expressing cell bodies. The mean GCaMP6s fluorescence 
intensity in time was extracted. Changes in fluorescence signal were calculated as 
AF/Fo, where the baseline fluorescence (Fo) is the mode of a kernel density estimate 
of F (ksdensity function in Matlab). Calcium events were detected by setting a 
threshold of 3% change in fluorescence from baseline. 

Receiver-operating characteristic curves. To identify whether the calcium event 
rate (events per min) at P7 could act as a binary classifier in distinguishing cells that 
would be alive or dead at P8, we plotted the ROC curve™ by varying the discrim- 
ination threshold (in this case, the P7 event rate) and calculated the AUC. To test 
for statistical significance, the cell labels were randomly shuffled 5,000 times. On 
each shuffle, we calculated the ROC curve and the corresponding AUC. We then 
compared our observed AUC to the distribution of shuffled AUCs. The P value is 
the fraction of shuffled AUCs > observed AUC. 

Intracranial injections. pAAV8-hSyn-DiO-hM3D(Gq)-mCherry and pAAV8-hSyn- 
DiO-hM4D(Gi)-mCherry were gifts from B. Roth (Addgene plasmids #44361 and 
#44362)°°. PO mice were anaesthetized with isoflurane and mounted in a stereo- 
taxic frame. Pups were injected with 600 nl of virus diluted in PBS and coloured 
with 0.5% Fast Green (Sigma). Injections were targeted for the somatosensory 
cortex with an injection rate of 10 nl/s. 

Drugs. For DREADDs experiments, CNO (Tocris) was dissolved in 5% dimethyl 
sulfoxide (Sigma) and then diluted with 0.9% saline to either 0.1 mg/ml or 0.5 mg/ml. 
Pups were injected with vehicle (0.05% DMSO) or CNO (10 jl per g) subcutane- 
ously for 4 days, twice daily. For the PTEN inhibitor experiments, dipotassium bis- 
peroxovanadium(pic) dehydrate (bpV(pic), Sigma) was dissolved in 0.9% saline to 
0.2 mg/ml. Pups were injected with vehicle (0.9% saline) or bp V(pic) (10 il per g) 
intraperitoneally for 2 days, twice daily. All treatments for CNO and PTEN inhib- 
itor experiments were randomly assigned. 

Western blotting. Mouse somatosensory cortex tissue was homogenized in RIPA 
lysis buffer containing 50 mM Tris pH 8, 150 mM NaCl, 2 mM EDTA, 0.5% 
sodium deoxycholate, 0.1% SDS, 1% NP-40 and 1x protease inhibitor cocktail 
(cOmplete, Sigma). Samples were denatured in Laemmli sample buffer?” and 
run on 10% SDS-PAGE gels. Separated proteins were electrophoretically trans- 
ferred onto PVDF membranes. Membranes were blocked with 5% BSA in TBST 
(20 mM Tris-HCl pH 7.5, 150 mM NaCl and 0.1% Tween20) for 1 h and probed 
with rabbit anti-P-Akt (Ser473, Cell Signalling, 1:1,000) overnight at 4°C, followed 
by an HRP-conjugated donkey anti-rabbit antibody (Thermo Fisher, 1:10,000). 
The blots were developed using ECL femto western blotting detection reagents 
and following read-out, they were stripped (Thermo Fisher). After confirming 
stripping efficiency, an HRP-conjugated mouse anti-Akt antibody (Cell Signalling, 
1:1,000) was added overnight at 4°C. The blots were developed using ECL western 
blotting detection reagents, the signals were registered and, following stripping, 
an HRP-conjugated rabbit anti-Actin (Sigma, 1:20,000) was added for 1 h at room 
temperature. Pico ECL western blotting reagent was used to detect actin levels. 
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Signals were read on a Li-COR Odyssey Imaging Band and intensities were ana- 
lysed using ImageStudioLite. 

Image acquisition. Images used for analysis were obtained from an ApoTome 
(Zeiss), epifluorescence microscope (Leica), or SP8 confocal microscope (Leica). 
ApoTome images were taken using the ApoTome function in Zen2 software. 
Images obtained with the confocal and epifluorescence microscope were taken 
using LAS AF software. 

Cell counting. Cortical layers were identified on the basis of their distinct histo- 
logical characteristics. Layer 1 was identified as a sparsely populated cell layer. The 
border between layers 2/3 and 4 was distinguished by the higher nuclei density of 
layer 4. Layer 5 was identified as the layer basal to layer 4 and above layer 6, which 
contains less densely packed nuclei. Cell density, within cortical layers, was quanti- 
fied either manually or using custom routines written in Matlab (MathWorks). For 
manual quantification, all analyses were conducted blind and cells were counted 
in a rectangular area, 551.5 1m wide at the pia surface within the somatosensory 
cortex, auditory cortex or motor cortex. Cells were counted without using pseudo- 
colour in Fiji. Automatic quantification was carried out blinded and using mor- 
phological operations for image segmentation. 

To identify PTEN staining intensity in tdTomato* or GABA? interneurons, 
self-designed Cell Profiler*® pipelines were used. In brief, tdTomatot or GABAt 
interneurons were identified as primary objects using the global Otsu thresholding 
method and any objects outside the pre-set diameter range (25-100 pixels) were 
excluded. PTEN intensity was measured under this cell mask. 

Blood vessel analysis. The fraction of the total area covered by blood vessels and 
the average vessel diameter were quantified blind using ‘Vessel Analysis, an Image] 
plugin (http://imagej.net/Vessel_Analysis; N. Govindaraju and M. Elfarnawany). 
Statistical analyses. Unless specified, results were plotted and tested for statisti- 
cal significance using Prism 7. The samples were tested for normality using the 
Shapiro- Wilk normality test. Unpaired comparisons were analysed using two- 
tailed unpaired Student's t-tests (normally distributed) and Mann-Whitney tests 
(not normally distributed). Multiple comparisons with single variables were ana- 
lysed using one-way ANOVA with post hoc Tukey’s test (comparing the mean of 
each column with the mean of every other column) or Dunnett’s test (comparing 
the mean of each column with the mean ofa control column) for normally distrib- 
uted samples. For samples with nonparametric distribution, either Kruskal-Wallis 
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(single measures) or Friedman's test (repeated measures) was performed followed 
by the post hoc Dunn's test. For multiple comparisons with more than one variable, 
a two-way ANOVA with post hoc Sidak’s test was used. The cumulative distribu- 
tions of PTEN intensity levels were compared using the Kolmogorov-Smirnov test. 
Analysis of calcium events rate was carried out in Matlab. In box plots, the central 
mark indicates the median, and the bottom and top edges of the box indicate the 
25th and 75th percentiles, respectively. The whiskers extend to the most extreme 
data points not considered outliers. No statistical methods were used to predeter- 
mine sample size. Sample sizes were calculated on the basis of similar published 
studies. All experiments were replicated at least in two different litters. Unless 
otherwise stated, the experiments were not randomized (that is, assignments were 
based on genotypes) and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. All data and/or analyses generated during the current study are 
available from the corresponding author upon reasonable request. 

Code availability. For automatic quantification, the code was written in Matlab 
(Mathworks) and is available from the corresponding author on reasonable request. 
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P4 Nex®e*-Fucci2 (mCherry Cleaved caspase-3) 


P7 Nkx2-1-Cre;RCL?omato (tdTomato Cleaved caspase-3) 


i?) 


Nex'’e*:Fucci2 (mCherry CTGF) 


Extended Data Fig. 1 | Extensive cell death in layer 2-6 pyramidal cells. 


a, Coronal sections through the S1 cortex of P4 Nex@'!*+;Fucci2 (left) 

and P7 Nkx2-1-Cre;RCL'4!"*"° (right) mice immunostained for cleaved 
caspase-3 (yellow) and mCherry (green, left) or tdTomato (magenta, 
right). b, Quantification of density of cleaved caspase-3 cells in pyramidal 
neurons (left, green) and MGE interneurons (right, magenta) during 
postnatal development (for pyramidal neurons, ANOVA, F = 73.6, 

*** D — 0.003 (P2 versus P4), ***P= 0.00006 (P4 versus P7), n= 3 mice 
for all ages; for MGE interneurons, ANOVA, F= 16.91, *P=0.027 
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(P5 versus P7), **P=0.0029 (P7 versus P10), n = 3 animals for all ages). 
c, Coronal sections through the barrel cortex of Nex@'!*+;Fucci2 mice 
during postnatal development immunostained for mCherry (green) and 
CTGF (yellow). d, Total number of pyramidal cells excluding subplate 
cells in the neocortex of Nex“!+;Fucci2 mice (ANOVA, F = 4.83 and 
*P— 0.03; n=3 mice for P2 and P5, and 4 mice for P3, P4 and P21). 

e, Temporal variation in the percentage of pyramidal cells excluding the 
subplate contribution during postnatal development. Data are shown as 
mean +s.e.m. Scale bars, 100 um. 
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Extended Data Fig. 2 | Interneuron cell loss in the barrel field during and POA interneurons in S1BF of Nkx2-1-Cre;RCL'*"” mice during 
postnatal development. a, Coronal sections through S1BF of Nkx2-1- postnatal development (ANOVA, F= 6.40 and *P= 0.03; n=4 animals for 
Cre;RCL'4!mato mice (magenta, MGE interneurons) during postnatal each age). Data are shown as mean +s.e.m. Scale bar, 100 1m. 


development counterstained with DAPI (grey). b, Total number of MGE 
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Extended Data Fig. 3 | See next page for caption. 
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Extended Data Fig. 3 | Alteration of pyramidal cell activity affects 
interneuron density but not distribution. a, Coronal sections through 
S1BF cortex immunostained for GABA (magenta) and NeuN (green) 
and counterstained with DAPI (grey) from P21 Nex©’'+ mice injected 
with hM3Dq-mCherry virus followed by vehicle or CNO treatment. 

b, Quantification of the density of GABA (left) and NeuNt but GABA™ 
(right) cells in P21 mice injected with hM3Dq-mCherry followed by 
vehicle (grey) or CNO (magenta) treatment (two-tailed Student’s unpaired 
t-test, **P=0.005 (GABA), P=0.68 (NeuN* GABA), n=4 animals 
for vehicle, n = 3 animals for CNO conditions). c, d, Quantification of 
the distribution of PV* (left) and SST* neurons (right) in P21 Nex@/* 
mice injected at PO with hM3Dq-mCherry (c) or hM4Di-mCherry (d) 
and treated with vehicle (grey) or CNO (magenta) during P5—P8 (two- 
way ANOVA, Frreatment = 0.48, P= 0.50 (hM3Dq PV), Fireatment = —0.04, 
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P=0.99 (hM3Dq SST), Fireatment = 0.88, P = 0.37 (hM4DI PV), 

Freatment = 0.79, P= 0.39 (hM4DI SST); for PV, n=7 animals for hM3Dq 
and hM4DI —CNO, 6 animals for hM3Dq +CNO, and 5 animals for 
hM4DI +CNO; for SST, n= 9 animals for hM3Dq —CNO, 7 animals for 
hM3Dq +CNO and hM4Di —CNO, and 5 animals for hM4DI +CNO). 
e, Coronal sections through auditory cortex immunostained for PV 
(magenta) or SST (magenta) and counterstained with DAPI (grey) from 
P21 Nex“!+ mice injected with hM3Dq-mCherry virus followed by vehicle 
or CNO treatment. f, Quantification of the density of PV* (right) and 
SST* neurons (left) in auditory cortex in P21 mice injected with hM3Dq- 
mCherry followed by vehicle (grey) or CNO (magenta) treatment (two- 
tailed Student’s unpaired t-test, P=0.574 (PV), P=0.419 (SST), n=4 
animals for both). Data are shown as mean +s.e.m. Scale bars, 100 1m. 
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Extended Data Fig. 5 | Alteration of pyramidal cell activity beyond the 
normal period of interneuron cell death does not affect interneuron 
survival or distribution. a, Schematic of experimental design. 

b, c, Coronal sections through S1BF immunostained for PV (b) or SST 
(c) and counterstained with DAPI (grey) from P21 Nex“!+ mice injected 
with hM3Dq-mCherry (left) or hM4Di-mCherry (right) viruses followed 
by vehicle or CNO treatment. d, g, Quantification of the density of PV* 
(d) and SST* (g) cells in P21 hM3Dq-mCherry injected mice (left bars) 
and hM4Di-mCherry injected mice (right bars) followed by vehicle (grey 
bars) and CNO (magenta bars) treatment at P10-P13 (for PV, two-tailed 
unpaired Student's t-test, P= 0.99 and P= 0.087, respectively; for SST, 


two-tailed unpaired Student's t-test, P= 0.56 and P = 0.37, respectively; 
n=4 animals for hM3Dq -CNO and 3 animals for all other groups). 

e, f, h, i, Quantification of the distribution of PV* (e, f) and SST* cells 

(h, i) in mice injected with hM3Dq-mCherry (e, h) or hM4Di-mCherry 

(f, i) followed by vehicle (grey bars) or CNO (magenta bars) treatment 

at P10-P13 (two-way ANOVA, Fireatment = 0.15, P= 0.71 (hM3Dq PV), 
Freatment = 0.60, P= 0.48 (hM3Dq SST), Fireatment = 1.00, P= 0.37 (hM4DI 
PV), Fireatment = 1.78, P= 0.25 (hM4DI SST); n = 4 animals for hM3Dq - 
CNO and 3 animals for all other groups). Data are shown as mean +s.e.m. 
Scale bar, 100m. 
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Extended Data Fig. 6 | Loss of BAK and BAX prevents programmed 
cell death in pyramidal cells. a, Coronal sections through S1BF from P2 
and P21 Nex®"!+;Bak~/~;Bax!",Fucci2 mice immunostained for mCherry 
(green) and CTGF (yellow). b, Total number of pyramidal cells (excluding 
subplate cells) in the neocortex of P2 and P21 Nex!+:Bak~/~;Bax!", 
Fucci2 mice (two-tailed Student’s unpaired t-test, P= 0.30; n =3 animals 
for both ages). Data are shown as mean + s.e.m. Scale bar, 100m. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


ARTICLE 


a b 
Bak*”*-Baxffi 
50 BB Nex©’®:Bak’;Bax™ 50 2.0 
H hoe 
— 40 ‘ = 40 ra £ 2 15 7 
= rh = ee 
2 30 _—) 3 30 gs 
8 a : o> 1.0 
+ 20 q ° rs F, 20 3 8 E 
O 72) r © S 5 05 
10 10 3 2 : 
LL - 
0 * 2 0 =) im 0.0 
1 2730=« 5 6 1 | 
Layer Layer Hh Nex@;Bak’;Bax™ 
C1 Nkx2-1-Cre;Bak’;Bax™ 
Bak*’;Bax™ 20. 
an 7] Nkx2-1-Cre;Bak”;Bax™ 50 — 
3a 
ox 
~ 40 e = 40 3 G9 154 
= . a Ee + 8 : 
2 30 . 8 = 30 oh 
9 ° rz) - e o #® 1.04 
+ e| ow 
= 20 ‘a ssf 20 i ae ae 
o 7) £e 
© | 2 6 0.54 
10 10 <4 
Ww - 
Q+—e—e 0 —s 0.0 
1 23.04 5 6 1 | 5 6 
Layer Layer 
Cc DAPI PV DAPI SST d 
150, 
200, P 
— ex : -——1 * 
|i S = as 1504 3 
= a = |i a = F E 100] 
S| + |i ili > Pi 
a |§ x ao x |§ = 100] 8 ee 
$ a < a ° + 
«x son aa as + e 504 
: § S : z g 
aa) 55 a 35 : 50 | 
= =\k 
) 0 i 


Extended Data Fig. 7 | Loss of BAK and BAX in pyramidal cells or 
MGE and POA interneurons affects densities but not lamination 

of MGE and POA interneurons. a, Quantification of the distribution 

of PV* (left) and SST* (right) interneurons in P30 control (grey), 
Nex“!+ ;Bak~-;Bax!™!! (dark magenta) and Nkx2-1-Cre;Bak ~~ Baxt/f! 
(light magenta) mice (two-way ANOVA, Fireatment = 3.56, P= 0.10 
(Nex“@'+ PV), Fireatment = 0.44, P= 0.53 (Nkx2-1-Cre PV), Fireatment = 0, 
P=0.99 (Nex“!+ SST), Fireatment = 0.44, P= 0.54 (Nkx2-1-Cre SST), 
n=A animals for Nex!+;Bak~/~;Bax' (PV) and 5 animals for all other 
groups). b, Quantification of the fold change in the density of PV* (top) 
and SST* (bottom) interneurons in Nex“®’*;Bak~/~;Bax!"' (dark magenta) 
and Nkx2-1-Cre;Bak~/~;Bax!!! (light magenta) mice compared to their 


respective controls (two-tailed Student’s unpaired t-test, P=0.90 (PV), 
P= 0.67 (SST); for PV, n=4 animals for Nex“@!+;Bak~/~;Bax, 

6 animals for Nkx2-1-Cre;Bak~/~;Bax!"", for SST, n=5 animals for 

both Nex“!+;Bak ~~; Bax!!! and Nkx2-1-Cre;Bak~/~;Bax!), 

c, Coronal sections through the motor cortex of P30 Bak*! + Bax! and 
Nex!*+;Bak~/~;Bax"" mice immunostained for parvalbumin (PV, left) 
and somatostatin (SST, right) and counterstained with DAPI (grey). 

d, Quantification of the density of PV* (left) and SST* (right) cells in 
the motor cortex of control and pyramidal cell-specific Bax/Bak double 
mutant mice at P30 (two-tailed Student’s unpaired t-test, *P = 0.02 (PV), 
*P=0(.01 (SST); for PV, n= 4 animals for both; for SST, n =3 animals for 
both). Data are shown as mean +s.e.m. Scale bar, 100 ,1m. 
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Extended Data Fig. 8 | See next page for caption. 
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Extended Data Fig. 8 | PTEN expression in deep layer cortical 
interneurons and effects of loss of PTEN function on neurons and 
blood vessels. a, Coronal sections through layer 5 of S1BF from Nkx2-1- 
Cre;RCL'4!m4!0 mice at P5, P7, P8 and P10, immunostained for PTEN and 
counterstained with DAPI (grey). PTEN expression is shown as a custom 
LUT in tdTomato-masked cells. b, Cumulative distribution of mean PTEN 
intensity in layer 5 and 6 MGE and POA interneurons (Kruskal-Wallis 
test, ***P = 0; n =7,270 cells (P5), 4,544 cells (P7), 6,780 cells (P8) and 
5,043 cells (P10) from 3 mice at each age). c, Coronal sections through 
S1BF from Pte" and Tie Crerienl mice at P16 immunostained for 
GABA (red, left), NeuN (green, middle) and isolectin B4 (IB4, cyan, right) 
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and counterstained with DAPI (grey). d, Quantification of the density of 
GABAt (far left) and NeuNt GABA™ (left) cells and vessel area (right) 
and diameter (far right) in P16 Pten!" (grey) and Lhx6-Cre;Pten™/ 
(magenta) mice (two-tailed unpaired Student's t-test, **P = 0.0035 
(GABA), *P= 0.0326 (vessel area), P= 0.0810 (vessel diameter); 
Kolmogorov-Smirnov test, P= 0.1000 (NeuNt GABA® cells), n =3 mice 
for both genotypes). e, Quantification of the distribution of PV* (left) and 
SST* (right) cells in P16 Pten!" (grey) and Lhx6-Cre;Pten!' (magenta) 
mice (two-way ANOVA, Feenotype= 0.29, P= 0.61 (PV); Fgenotype = 0.0004, 
P=0.98 (SST); n=4 Ptenfilt mice and 3 Lhx6-Cre;Pten™M mice). Data are 
shown as mean +s.e.m. Scale bars, 100m. 
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Extended Data Fig. 9 | Pharmacological inhibition of PTEN during the S1BF from P21 mice injected at P7—P8 (d) or P12-P13 (g) with vehicle 


interneuron cell death period increases interneuron survival. (left) or Bp V(pic) (right) and immunostained for PV and SST and 

a, f, Schematics of experimental design. b, Coronal sections through S1BF counterstained with DAPI. e, h, Quantification of the density of PV* (left) 
from P10 mice injected at P7—P8 with vehicle (left) or BpV(pic) (right) and SST* (right) cells in SBF from P21 mice injected at P7—P8 (e) or 
stained for isolectin B4 (IB4, cyan) and DAPI (grey). c, Quantification P12-P13 (h) with vehicle (grey) or BpV(pic) (magenta) (P7-P8 groups: 

of blood vessel area (left) and diameter (right) in P10 mice treated with two-tailed unpaired Student's t-test, *P = 0.04 (PV), *P=0.03 (SST); n=7 
vehicle (grey) or BpV(pic) (magenta) (Kolmogorov—Smirnov test (vessel mice for each group; P12-P13 groups: two-tailed unpaired Student's t-test, 
area), P=0.60; two-tailed unpaired Student's t-test (vessel diameter), P=0.84 (PV), P=0.82 (SST), n=5 animals for each group). Data are 
P=0.58, n=3 animals for each group). d, g, Coronal sections through shown as mean +s.e.m. Scale bars, 100 um. 
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Structural basis of ubiquitin modification 
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Protein ubiquitination is a multifaceted post-translational modification that controls almost every process in eukaryotic 
cells. Recently, the Legionella effector SdeA was reported to mediate a unique phosphoribosyl-linked ubiquitination 
through successive modifications of the Arg42 of ubiquitin (Ub) by its mono-ADP-ribosyltransferase (mART) and 
phosphodiesterase (PDE) domains. However, the mechanisms of SdeA-mediated Ub modification and phosphoribosyl- 
linked ubiquitination remain unknown. Here we report the structures of SdeA in its ligand-free, Ub-bound and Ub- 
NADH-bound states. The structures reveal that the mART and PDE domains of SdeA form a catalytic domain over its 
C-terminal region. Upon Ub binding, the canonical ADP-ribosyltransferase toxin turn-turn (ARTT) and phosphate- 
nicotinamide (PN) loops in the mART domain of SdeA undergo marked conformational changes. The Ub Arg72 might 
act as a ‘probe’ that interacts with the mART domain first, and then movements may occur in the side chains of Arg72 
and Arg42 during the ADP-ribosylation of Ub. Our study reveals the mechanism of SdeA-mediated Ub modification and 
provides a framework for further investigations into the phosphoribosyl-linked ubiquitination process. 


Ubiquitination is one of the most prevalent protein modifications in 
eukaryotic cells, regulating a wide array of essential cellular processes’. 
Ubiquitination is carried out by a three-enzyme cascade (E1, E2 and 
E3), and results in the transfer of ubiquitin (Ub) to a lysine residue of 
the substrate*. Prokaryotes do not contain the Ub-proteasome system, 
but a variety of bacterial pathogens adopt intricate mechanisms to 
influence the host Ub system to support their own survivals‘. Legionella 
pneumophila, the causative agent of a potentially fatal pneumonia known 
as Legionnaires’ disease*®, can survive and replicate within host cells 
by creating a vacuole” °. The biogenesis of the Legionella-containing 
vacuole is based on the approximately 300 Legionella substrates (effec- 
tors) that are translocated into host cells'°'%. Notably, recent studies 
have identified that the SidE family effectors from L. pneumophila 
could catalyse Ub transfer to several endoplasmic reticulum-associated 
human Rab GTPases! and the endoplasmic reticulum protein reticu- 
lon 4 (RTN4)!5, using a unique approach that differs from the canonical 
ubiquitination pathway’®. By targeting RTN4, the SidE family proteins 
could control the dynamics of tubular endoplasmic reticulum and pro- 
mote structural transformations of the tubules!°. 

The SidE family members are large proteins (approximately 1,500 
residues)° (Extended Data Fig. 1), such as SdeA, which contains a 
deubiquitinase (DUB) domain (SdeA DUB)!’, a PDE domain (SdeA 
PDE), an mART domain (SdeA mART) and a C-terminal domain 
(SdeA CTD) (Fig. 1a). During the ubiquitination process, the R42 
residue of Ub is first ADP-ribosylated with NAD* by SdeA mART, and 
then the phosphodiester bond of the ADP-ribosylated Ub (ADPR-Ub) 
is cleaved by SdeA PDE to make phosphoribosyl Ub (PR-Ub)!*!®. 
PR-Ub can either remain by itself or be linked via a phosphodiester 
bond to the hydroxyl group of serine residues in either the substrates 
or SdeA itself in a reaction catalysed by SdeA PDE. However, structural 
investigation of the mechanism behind these modifications is required. 


Overall structure of SdeA 

To understand the mechanism underlying SdeA-mediated Ub modifi- 
cations, we first solved the crystal structure of a truncated SdeA (amino 
acids 231-1190, hereafter called SdeA(231-1190)) (Extended Data 
Table 1) at 3.39 A resolution. This region of the protein includes SdeA 
PDE, SdeA mART anda part of SdeA CTD (SdeA pCTD). SdeA(231- 
1190) exhibited ubiquitination activities that were similar to full-length 
SdeA (Extended Data Fig. 2a, b). The crystal belonged to the C222, 
space group and two SdeA molecules were found in the asymmetric 
unit. However, results from the PISA server!®, gel filtration chroma- 
tography and analytical ultracentrifuge analysis indicated that SdeA 
mainly exists as a monomer in solution (Extended Data Fig. 2d, e). 
Notably, the two SdeA molecules in the asymmetric unit exhibited 
obvious conformational variations, with a core root mean square devi- 
ation (r.m.s.d.) of 1.82 A among 793 C, atoms. Superimposition of the 
two molecules revealed a better alignment of their PDE and mART 
domains, but prominent differences between their pCTDs, indicating 
that SdeA pCTD might be flexible in solution (Extended Data Fig. 2f). 
In the SdeA(231-1190) structure, SdeA MART and SdeA PDE interact 
primarily through hydrophobic interactions and form a catalytic core 
(Fig. 1b, c and Extended Data Fig. 3a—d), which sits on top of SdeA 
pCTD. In contrast to the more conserved PDE and mART domains, 
SdeA pCTD represented an overall novel fold completely composed 
of «-helices (Fig. 1b), which could be divided into two subdomains on 
two sides below SdeA mART (Extended Data Fig. 2g, h). 


Structure of the mART domain of SdeA 

SdeA mART contains a typical Rossmann fold!? (Fig. 2a), and exhibits 
the basic characteristics that are conserved among all known bacterial 
mART toxins”””!, SdeA mART folds in a two-lobe structure with 
an N-terminal a-helical lobe (residues 594-758) and a C-terminal 
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Fig. 1 | Overall structure of SdeA(231-1190). a, Domain architecture 

of SdeA. The residues corresponding to ANC SdeA are indicated. b, Two 
different views of the overall structure of SdeA(231-1190) coloured as in a. 
c, Two different views of the surface model of SdeA(231-1190). 


B-sandwich lobe (Fig. 2a). Notably, structural alignment revealed 
several unique features in the SdeA mART structure (Extended Data 
Fig. 4a—d). First, the canonical ARTT and PN loops in SdeA mART are 
both in different conformations from those of the other mART pro- 
teins. Second, the N-terminal a-helical lobe of SdeA mART also shows 
an overall fold different from those of the others. Third, two consecu- 
tive protruding helices are connected by a loop (residues 789-797, here 
named the plug), which inserts into and interacts with SdeA PDE. This 
plug loop is, to our knowledge, unique among known mART structures 
(Fig. 2a and Extended Data Figs. 3a—d, 4a—d). Notably, the much higher 
ADP-ribosylation activity of SdeA(193-935)"?774 when compared to 
SdeA(597-935) (the single SdeA mART) indicates that unlike SdeA 
PDE, which is fully active as a single domain (that is, it is able to pro- 
cess ADPR-Ub to PR-Ub and catalyse ubiquitination (Extended Data 
Fig. 3e)), SdeA mART needs to be stabilized by SdeA PDE to be active. 
Moreover, deletion of the plug loop of SdeA mART markedly reduced 
the activity of SdeA(193-998) (ANC SdeA)'® (Fig. 2f and Extended 
Data Figs. 3f, 4e-h). 


SdeA mART has a unique Ub-binding mode 

To our knowledge, SdeA mART is the first mART domain ever 
reported to catalyse ADP-ribosylation of Ub. To understand the molec- 
ular mechanism by which SdeA mART recognizes and mediates ADP- 
ribosylation of Ub, we solved the structure of the SdeA(231-1190)-Ub 
complex by mixing the two proteins directly in molar ratios of 1:4, 1:6 
and 1:8 (SdeA:Ub) before crystallization. Notably, the SdeA(231-1190) 
structure was solved with three bound Ubs, one at SdeA mART, and 
the other two at SdeA pCTD (Fig. 2b and Extended Data Fig. 5a-d). 
The binding of Ub caused a prominent conformational change 
(r.m.s.d. = 2.24 A among 830 C, atoms) of SdeA, particularly in the 
N-terminal region of SdeA pCTD (Extended Data Fig. 5b), again 
demonstrating the flexibility of SdeA pCTD. However, SdeA pCTD 
was dispensable for the in vitro activity of SdeA, as SdeA(193-935) 
was fully capable of Ub modification and RAB33B ubiquitination 
(Extended Data Figs. 4i, 5a). Therefore, we focused on the Ub bound 
to SdeA mART in the subsequent studies. 

To our knowledge, the Ub-SdeA mART binding represents a novel 
binding mode that differs from all known Ub-protein interactions)”. 
Burying a surface area of 607.8 A”, the overall structure of SdeA mART 
in the complex is similar to that of apo-SdeA mART, with an r.m.s.d. 
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Fig. 2 | SdeA mART undergoes conformational changes upon 

Ub binding. a, Structure of SdeA mART. b, Overall structure of the 
SdeA(231-1190)-Ub complex. SdeA is coloured as in Fig. 1a; three Ub 
molecules are coloured magenta. c, Structural superimposition of apo- 
SdeA mART (grey) and the SdeA mART-Ub complex (coloured as in b) 
shows conformational changes. d, Enlarged view of the outlined region in 
c, major SdeA mART residues which move upon Ub binding are shown 
in stick representation. e, Expanded view showing a superimposition 

of regions of the ARTT and PN loops before and after Ub binding. The 
conformational change is highlighted with a blue arrow. f, Relative 
amounts of the unmodified Ub, PR-Ub and ADPR-Ub were studied by 
top-down liquid chromatography—mass spectrometry analysis after they 
were isolated from the reaction mixtures. WT, wild type. 


of 1.04A among 293 C, atoms (Fig. 2c). Nonetheless, the ARTT and 
PN loops undergo marked conformational changes upon Ub binding 
(Fig. 2d, e). In response to Ub binding, the C, atom of SdeAF® in the 
ARTT loop moves towards Ub by 6.0 A to interact with Ub®” via its 
side chain, which is simultaneously stabilized by SdeA2” (Figs. 2d, 3a). 
In addition to Ub”, another Ub residue that is essential for its recog- 
nition is Ub®”4, which inserts into a negatively charged groove on SdeA 
mART by forming electrostatic interactions with the side chains of 
SdeAP?! and SdeAP”” from the «-helical lobe (Fig. 3a and Extended 
Data Fig. 6a, b). Meanwhile, the C, atom of SdeA'®? in the PN loop 
moves 3.8 A away from its original position upon Ub binding (Fig. 2d). 
Notably, SdeA®*? acts as a gate for Ub binding, as it occupies the 
site in the loop that links 83 and a2 of Ub in the apo-SdeA structure 
(Fig. 2d). Together, the ARTT and PN loops and the a-helical lobe of 
SdeA mART contribute to Ub recognition. 

Of the SdeA residues that interact with Ub, SdeA** has been shown 
to have an essential role in ADP-ribosylation of Ub!4!6, Moreover, 
SdeA mutant proteins (D691A/D707A, T831D and W832A) all 
showed decreased activities in Ub phosphoribosylation and substrate 
ubiquitination, and these mutations also inhibited the yeast toxicity 
of SdeA (Fig. 3a—c and Extended Data Fig. 7a, b). Consistently, the 
Ub®”44 mutant showed no modification upon treatment with SdeA, as 
has previously been observed in Ub®4 and Ub®”*“ mutants"® (Fig. 3d). 
We also found that these three Ub mutants were defective in the SdeA 
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Fig. 3 | Ub®”? and Ub®”4 mediate Ub recognition by SdeA mART. 

a, Detailed interactions between Ub (magenta) and SdeA mART (green). 
Polar interactions are represented by red dashed lines. b, Wild-type 

ANC SdeA or the mutant proteins were incubated with Ub, RAB33B 

and NAD* for 30 min at 37°C. The samples were then subjected to 
SDS-PAGE, with Coomassie staining (top) and phospho-specific staining 
with Pro-Q Diamond (bottom). ¢, Galactose-inducible pYES2 plasmids 
containing wild-type ANC SdeA or the mutants were transformed into 
yeast W303 strain. Five microlitres of cells in three tenfold serial dilutions 
were spotted on both glucose- and galactose-containing plates lacking 
uracil for two days before image acquisition. d, Wild-type Ub and the 


mART-catalysed reaction, as they could not be ADP-ribosylated by 
SdeA (193-935) "774 (Fig. 2f and Extended Data Fig. 4f). 

The interaction pair of SdeA***? and Ub**, and the 144 patch of Ub 
seemed to have a minor role in SdeA mART-Ub binding, as muta- 
tions of these residues did not influence Ub modification or substrate 
ubiquitination (Fig. 3b-d). SdeA mART-mediated ADP-ribosylation 
of Ub is specific, as Ub-like proteins SUMO1 and NEDD8 could not be 
ADP-ribosylated by SdeA(193-935)#2””4, which is consistent with their 
structural differences (Extended Data Fig. 6c, d). Structural alignments 
also suggested that SdeA MART is able to modify both the proximal 
and distal moieties of K63-, K48-, K11- and M1-linked diubiquitins 
(Extended Data Fig. 5e-h), a finding that is consistent with a previous 
study”. 

To further investigate the Ub modification mechanism of SdeA, we 
solved the structure of SdeA(231-1190)-Ub-NADH complex using 
a soaking approach (Extended Data Table 1). NADH is reported to 
be an inhibitor of the mART activity of the enzyme component of the 
iota-toxin from Clostridium perfringens” and is predicted to occupy the 
binding site of NAD*. Similarly, NADH also had an inhibitory effect on 
the SdeA-catalysed ubiquitination of RAB33B (Extended Data Fig. 2c). 
In the structure of the complex, NADH in the cavity forms a ring-like 
conformation (Fig. 3e and Extended Data Fig. 7e). The overall structure 
of SdeA mART-Ub remains unchanged upon NADH binding (r.m.s.d. 
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indicated mutants were incubated with ANC SdeA, RAB33B and NAD* 
for 30 min at 37°C. The samples were then treated as in b. Uncropped 
blots and gel images for b and d are shown in Supplementary Fig. 1. 

e, Superimposition of the structures of SdeA-Ub and SdeA-Ub-NADH. 
The whole SdeA-Ub complex is shown in grey. SdeA and Ub in the SdeA- 
Ub-NADH complex are shown as in a. f, MST assays of the binding of 
NAD* to wild-type SdeA(231-1190) and the indicated mutants. Means are 
indicated by horizontal lines and individual values from three independent 
experiments are shown with their markers. Binding curves and Kg values 
are also shown. NA, for the N723A, R729A and R766A mutants, Kg could 
not be determined. 


10,000 


of 0.36 A among 307 C, atoms). Marked conformational changes only 
occur in the side chains of SdeA®”® and SdeA’**”, which interact with 
the phosphate group of NADH through hydrogen bonding and with 
the nicotinamide ring through 1-7 interactions, respectively (Fig. 3e). 
Mutations of the NADH-binding residues of SdeA all impaired the 
Ub modification, substrate ubiquitination and yeast toxicity of SdeA 
(Fig. 3b, e and Extended Data Fig. 7c, d, f). The decreased binding to 
NAD* by the mutant proteins was confirmed using the microscale 
thermophoresis (MST) assay (Fig. 3f). Notably, although the SdeA 
variants with mutations in the Ub- or NADH-binding sites were not 
able to catalyse ubiquitination with normal Ubs, they could utilize and 
process ADPR-Ub to complete the ubiquitination reaction (Extended 
Data Fig. 7g-i), indicating that none of the mutations interfere with 
the activity of SdeA PDE. 


Suggested mechanism of SdeA mART ADP-ribosylation 

An SN; nucleophilic reaction mechanism of the ADP-ribosylation 
of arginine residues by mART proteins has been proposed by several 
studies*>°, However, a notable point of the SdeA-Ub-NADH complex 
structure is the 11.7 A distance between the nucleophile (NH1/2 atom 
of Ub®"”) and the electrophile (C1D of the N-ribose, the ribose linked 
to the nicotinamide group of NAD‘), which is too far for an SN, attack 
(Fig. 4a). Nevertheless, Ub®”, which forms polar interactions with the 
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Fig. 4 | Proposed conformational changes during the reaction and 
function of SdeA CTD. a, The conformations of Ub®” and Ub®” in the 
SdeA mART-Ub-NADH complex are shown. b, Molecular dynamics 
simulation results of the SdeA mART-Ub-NAD* complex (left) and the 
SdeA mART-Ub-intermediate system (right). c, In vitro glutathione 


side chain of SdeA™®®°, the backbone carbonyl of SdeA®”!3 and the 
O3D atom of the NADH ribose group, is much closer to the site of the 
electrophile with a distance of 4.6 A (Fig. 4a). Because we could not 
obtain the structure of the SdeA-Ub-NAD* complex, we performed 
molecular dynamics simulations with NADH replaced by NAD* in the 
complex, during which the positions of the side chains of Ub®” and 
Ub®”, and NAD* remained almost unchanged (Fig. 4b and Extended 
Data Fig. 7j-m). 

In the proposed SN; mechanism, the highly folded and strained 
conformation of the nicotinamide mononucleotide region of NADt 
induces an equilibrium shift towards formation of an oxocarbenium 
cation intermediate (NAD* with its nicotinamide group cleaved, here- 
after referred to as the intermediate)”>°. Therefore, we also performed 
molecular dynamics simulations with NADH replaced by the intermedi- 
ate in the SdeA mART-Ub-NADH structure (Extended Data Fig. 7j-m). 
Notably, after the simulation, Ub®” moved towards the ARTT loop 
and away from the intermediate. Instead, Ub®” entered the active site, 
occupied the original position of Ub®”, and formed electrostatic inter- 
actions with SdeA**® (Fig. 4b). After the system reached equilibrium, 
the average distance between the nucleophile (Ub®””) and the elec- 
trophile was 4.46 A (Fig. 4b and Extended Data Fig. 7m). Consistent 
with the conformational change observed in molecular dynamics 
simulation, the conformation of Ub®” is highly variable among the 
available structures of Ub (Extended Data Fig. 8a). 

Together with the structural and biochemical results, we propose that 
during the catalytic process, Ub®” might function as a ‘probe; together 
with Ub®”4, by anchoring Ub on SdeA mART. After cleavage of the 
nicotinamide group from NAD‘, the strain in the highly folded struc- 
ture of the intermediate would be alleviated, which might destabilize 
the binding of Ub®”, causing it to leave. This in turn could facilitate 
the approach of Ub®” to the active site. However, the exact catalytic 
cycle still needs further investigation. Moreover, SdeA mART-produced 
ADPR-Ub will be processed to PR-Ub, and linked to the target protein 
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S-transferase (GST) pull-down assays to detect the IcmS binding region of 
SdeA. d, Superposition of the gel filtration chromatograms of SdeA(1191- 
1350), the IcmS-IcmW-DotLc complex and a mixture of both. Coomassie 
blue staining of the peak fractions after SDS-PAGE are shown on the right. 
c-d, Uncropped gel images are shown in Supplementary Fig. 1. 


by SdeA PDE, the mechanism of which can be gained from the struc- 
ture of SdeA PDE with its substrates?”8, 


SdeA CTD interacts with IcmS-IcmW 

Because the N-terminal SdeA DUB and approximately 300 residues 
of the C-terminal were not included in our crystallized construct, we 
next studied the solution structure of four constructs of SdeA using 
the small-angle X-ray scattering (SAXS) method to investigate the 
spatial position of SdeA PDE-SdeA mART within SdeA (Extended 
Data Table 2 and Extended Data Fig. 8c—h). Both the scattering profile 
comparison and the reconstructed molecular envelope indicated that 
the crystal structure of SdeA(231-1190) is similar to its structure in 
solution (Extended Data Fig. 8d, e). Superimposing the envelope of 
SdeA(231-1190) onto the envelope of SdeA(1-1190) further revealed 
the extra electron density within the envelope of SdeA(1-1190) for 
SdeA DUB (Extended Data Fig. 8f), which forms a triangle-shaped 
catalytic core with SdeA MART and SdeA PDE. The envelope of the 
C-terminal SdeA (1092-1496) indicated the helical-bundle shape of 
this region (Extended Data Fig. 8g). Further superimposition of the 
SAXS envelopes of SdeA(1-1499), SdeA(1-1190), SdeA(1092-1496) 
and the crystal structure of SdeA(231-1190) reconfirmed the positions 
of SdeA DUB and SdeA CTD (Extended Data Fig. 8h). Notably, in 
vitro binding assays revealed that SdeA CTD is involved in binding 
to the adaptor protein complex Icm$-IcmW*”**? and the minimal 
binding region is SdeA(1191-1350), which could form a tight complex 
with Icm$-IcmW-DotLc (residues 656-798 of DotL)*! (Fig. 4c, d and 
Extended Data Fig. 8b). This suggests that SdeA CTD might function 
in the translocation of SdeA into host cells. 


Discussion 

An unresolved problem is how the ADPR-Ub is delivered from SdeA 
mART to SdeA PDE. One possibility is that two or more SdeAs might 
be close to each other in vivo and ADPR-Ubs produced by one SdeA 
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mART could be used by the PDE domain of an adjacent one. Our study 
provides mechanistic insight into the structure and function of SdeA 
and serves as a foundation for the further studies of phosphoribosyl- 
linked ubiquitination. 
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METHODS 

Protein expression and purification. The full length and various segments of L. 
pneumophila SdeA were amplified by PCR and cloned into pGEX6p-1 or pET22b 
vectors to produce GST-tagged fusion proteins with a PreScission Protease cleavage 
site between GST and the target proteins, or His-tagged protein. The SdeA mutants 
were generated by two-step PCR and were subcloned, overexpressed and purified 
in the same way as the wild-type protein. The SdeA clone with a deletion of residues 
789-797 was made by bridging PCR, and a GSG sequence was added between the 
two SdeA fragments. The proteins were expressed in Escherichia coli strain BL21 
and induced using 0.2 mM isopropyl-B-p-thiogalactopyranoside (IPTG) when 
the cell density reached an OD¢00 nm of 0.8. For GST-tagged proteins, after growth 
at 16°C for 12 h, the cells were collected, re-suspended in lysis buffer (1 x PBS, 
2 mM dithiothreitol (DTT) and 1 mM phenylmethanesulfony] fluoride) and lysed 
by sonication. The cell lysate was centrifuged at 20,000g for 45 min at 4°C to 
remove cell debris. The supernatant was applied onto a self-packaged GST-affinity 
column (2 ml glutathione Sepharose 4B; GE Healthcare) and contaminant pro- 
teins were removed with wash buffer (lysis buffer plus 200 mM NaCl). The fusion 
protein was then digested with PreScission protease at 4°C overnight. The protein 
with an additional five-amino-acid tag (GPLGS) at the N terminus was eluted with 
lysis buffer. The eluant was concentrated and further purified using a Superdex-200 
(GE Healthcare) column equilibrated with a buffer containing 10 mM Tris-HCl 
pH 8.0, 200 mM NaCl, and 5 mM DTT. The purified protein was analysed by SDS- 
PAGE. The fractions containing the target protein were pooled and concentrated to 
20 mg ml 1. Selenomethionine (Se—-Met)-labelled SdeA was expressed in E. coli 
B834 (DE3) cells grown in M9 minimal medium supplemented with 60 mg 1“? 
Se-Met (Sigma-Aldrich) and specific amino acids: Ile, Leu and Val at 50 mg I Lys, 
Phe and Thr at 100 mg1~!. The Se—Met protein was purified as described above. The 
SdeA(231-1190) segment was also cloned into pET22b vector, to make a construct 
with a C-terminal His tag, which was also used in purification and crystallization. 

The fragment of human RAB33B cDNA (residues 1-229) was cloned into the 
MCS1 of pRSFDuet vector to produce His-tagged fusion protein. The fusion pro- 
tein was induced in E. coli Rosetta (DE3) by 0.2 mM IPTG when the cell density 
reached an OD¢00 nm of 0.8. Recombinant His-tagged protein was purified using 
Ni-affinity column chromatography, ion exchange chromatography and was fur- 
ther subjected to gel filtration chromatography (Superdex-200 column) in buffer 
containing 10 mM Tris-HCl pH 8.0, 200 mM NaCl, 5 mM DTT. 

All Ub mutants used in the ubiquitination assay were cloned into pGEX6p-1 
vectors to produce GST-tagged fusion proteins. The proteins were purified accord- 
ing to the protocols for GST-tagged proteins, concentrated to 20 mg ml“ and 
stored at —80°C until use. In addition, the GST-tagged wild-type Ub was used in 
the self-ubiquitination experiments of SdeA(231-1190). 

For the IcmS protein and its complexes, IcmS was cloned into pGEX6p-1 vector 
and purified as stated above. His-tagged IcmS and IcmW were cloned into the 
MCS1 and MC32 sites of pRSFDuet vector, respectively. Then the IcmS-IcmW 
complex was purified as His-tagged proteins according to the protocol described 
above. For coexpression of IcmS-IcmW-DotLc complex, DotLc, which was cloned 
into pET22b vector, was co-purified with the bacteria containing the above pRSF- 
Duet vector (His-IcmS and IcmW), according to the same protocol as the IcmS- 
IcmW complex. For coexpression of the IcmS-IcmW-DotLc-LvgA complex, LvgA 
was cloned into a modified pET15b vector to produce N-terminal His-MBP tagged 
LvgA with a PreScission protease digestion site between them. The E. coli BL21 
(DE3) strain transformed with this vector and the above mentioned pRSFDuet 
vector expressed the three proteins. DotLc in the above mentioned vector was 
expressed from the E. coli BL21 (DE3) strain. Equal volumes of E. coli cultures 
were co-sonicated and cleared lysate was subjected to Ni-affinity column chroma- 
tography. After elution from the column, the protein was treated with PreScission 
protease, and the complex was purified through ion exchange chromatography 
and gel filtration chromatography. 

Crystallization, data collection and structure determination. The SdeA(231- 
1190) was concentrated to 20 mg ml"! in 10 mM Tris-HCl pH 8.0, 200 mM NaCl 
and 5 mM DTT. Crystals were grown using the hanging-drop vapour diffusion 
method. Crystals of SdeA were grown at 18°C by mixing an equal volume of the 
protein (20 mg ml’) with reservoir solution containing 50% Tacsimate pH 7.0, 0.1 
M Tris pH 8.8, 6% sorbitol. The crystals appeared overnight and grew to full size in 
about 4-5 days. The crystals were cryoprotected in reservoir solution containing 
10% glycerol before transfer to liquid nitrogen. Se-Met-labelled protein was crys- 
tallized in the same buffer, and the crystals diffracted better than the native crystals. 
After hundreds of crystal diffraction tests at Micro/Max-007HF from Rigaku and 
beamlines BL17U1 and BL19U1 of the Shanghai Synchrotron Radiation Facility 
(SSRF)*?, the crystal of the Se-Met-labelled protein suitable for structure deter- 
mination was finally obtained. Purified C-terminal His-tagged SdeA(231-1190) 
was mixed with purified Ub in molar ratios of 1:4, 1:6 and 1:8, in which the final 
concentration of SdeA(231-1190) was 24 mg ml-!. The crystals of the SdeA- 
Ub complex were also grown at 18°C by mixing an equal volume of the protein 
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mixture with a reservoir solution containing 0.1 M sodium malonate pH 8.0, 0.1 
M Tris pH 8.0, and 30% w/v polyethylene glycol 1,000. The crystals appeared after 
2 days and grew to full size in about 4-5 days. These crystals were also used in the 
NADH-soaking experiment. 

All the data were collected at SSRF beamline BL17U1 and BL19U1, integrated 
and scaled using the HKL2000 package*’. Further processing was carried out using 
programs from the CCP4 suite*4. SHELXD*° was used to locate the positions of 
selenium sites in SdeA. The identified anomalous scattering sites were input into 
PHASER® for single-wavelength anomalous dispersion (SAD) phasing. The 
real-space constraints were applied to the electron density map in DM””. As the 
resolution of the data was low, the final model rebuilding of SdeA was performed 
manually using Coot*® and the apo SdeA structure was refined with PHENIX” 
against the SAD data using non-crystallographic symmetry and stereochem- 
istry information as restraints. The structures of the SdeA(231-1190)-Ub and 
SdeA(231-1190)-Ub-NADH complexes were solved by molecular replacement 
with the structure of apo- SdeA(231-1190) and Ub (PDB: 1UBQ) as templates. 
Final Ramachandran statistics: 92.6% favoured, 6.0% allowed and 1.4% outliers for 
apo- SdeA(231-1190) structure; 93.8% favoured, 4.9% allowed and 1.3% outliers 
for SdeA(231-1190)-Ub structure; 93.2% favoured, 5.5% allowed and 1.3% outliers 
for SdeA(231-1190)-Ub-NADH structure. Structural illustrations were generated 
using PyMOL (v.1.8.0.0, https://pymol.org/). Data collection and structure refine- 
ment statistics are summarized in Extended Data Table 1. 

MST assay. The NAD* affinity of the purified wild-type SdeA (231-1190) and its 
mutants was measured using the Monolith NT.115 (Nanotemper Technologies). 
All the proteins used were desalted to MST buffer (10 mM HEPES pH 7.5, 
150 mM NaCl) before the experiment. The SdeA proteins were fluorescently 
labelled according to the manufacturer’s procedure and the protein concentration 
was adjusted to 10 j1M. Then fluorescent dye NT-647-NHS was added, mixed and 
incubated for 30 min at 25°C in the dark. For each assay, the labelled protein (about 
0.1 j1M) was mixed with the same volume of unlabelled NAD* of 16 different serial 
concentrations at room temperature. The samples were then loaded into premium 
capillaries (NanoTemper Technologies) and measured at 25 °C by using 20% LED 
power and medium MST power. Each assay was repeated three times. Data analyses 
were performed using MO.Affinity Analysis v.2.2.4 software. With a confidence of 
68%, the Ky value is within the given range. 

Structural analysis by SAXS. SAXS measurements were carried out at room 
temperature at the beamline 12 ID-B of the Advanced Photon Source, Argonne 
National Laboratory and the beamline BL19U2 of the National Center for Protein 
Science Shanghai and Shanghai Synchrotron Radiation Facility. The scattered X-ray 
photons were recorded with a PILATUS 1M detector (Dectris) at 12 ID-B anda 
PILATUS 100k detector (Dectris) at BL19U2. The setups were adjusted to achieve 
scattering q values of 0.005 < q < 0.89 A~! (12ID-B) or 0.009 < q < 0.415 A! 
(BL19U2), in which q= (41/A)sin@, and 20 is the scattering angle. Thirty 2D 
images were recorded for each buffer or sample solution using a flow cell, with an 
exposure time of 0.5-2 s to minimize radiation damage and obtain good signal- 
to-noise ratio. No radiation damage was observed as confirmed by the absence 
of systematic signal changes in sequentially collected X-ray scattering images. 
The 2D images were reduced to 1D scattering profiles using Matlab (12ID-B) or 
BioXTAS Raw (BL19U2). Scattering profiles of the proteins were calculated by 
subtracting the background buffer contribution from the sample-buffer profile 
using the program PRIMUS” according to standard procedures*’. Concentration 
series measurements (fourfold and twofold dilution and stock solution) for the 
same sample were carried out to remove the scattering contribution owing to 
inter-particle interactions and to extrapolate the data to infinite dilution. The for- 
ward scattering intensity [(0) and the radius of gyration (R,) were calculated from 
the data of infinite dilution at low q values in the range of qR, < 1.3, using the 
Guinier approximation: InI(q) ~ In(I(0))—R,?q"/ 3. These parameters were also 
estimated from the scattering profile with a broader q range of 0.006-0.30 A~? 
using the indirect Fourier transform method implemented in the program 
GNOM”, along with the pair distance distribution function (PDDF), P(r) and 
the maximum dimension of the protein, Dax. The parameter Dax (the upper end 
of the distance r), was chosen so that the resulting PDDF has a short, near-zero- 
value tail to avoid underestimation of the molecular dimension and consequent 
distortion in low-resolution structural reconstruction. The Porod volume of sol- 
utes (Vporoa), the volume-of-correlation (V.), were calculated using the programs 
PRIMUS and Scatter, respectively. The molecular masses of solutes were calcu- 
lated on a relative scale using the R,/V- power law as previously described“, as 
well as from AUTOPOROD™, independently of protein concentration and with 
minimal user bias. The theoretical scattering intensity of the atomic structure 
model was calculated and fitted to the experimental scattering intensity using 
CRYSOL*. Low-resolution ab initio shape reconstructions were performed with 
the program DAMMIN, which generates models represented by an ensemble of 
densely packed beads", using scattering data within the q range of 0.006-0.30 A“. 
Thirty-two independent runs for both programs were performed, and the resulting 
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models were subjected to averaging by DAMAVER™ and were superimposed by 
SUPCOMB‘* on the basis of the normalized spatial discrepancy criteria and were 
filtered using DAMFILT to generate the final model. 

Analytical ultracentrifugation. Sedimentation velocity experiments were per- 
formed at 20°C with a Beckman XL-] analytical ultracentrifuge (Beckman Coulter) 
equipped with a four-cell An-60 Ti rotor. Samples were clarified by centrifuga- 
tion at 12,000 r.p.m. for 10 min in a tabletop centrifuge before the experiment. 
Reaction buffer containing 10 mM Tris, pH 8.0, 200 mM NaCl and 5 mM DTT 
was used as the reference solution, and ~400 iil of SdeA(231-1190) peak fractions 
(OD280 nm = 0.7) was loaded in two-channel centerpieces fitted with sapphire win- 
dows in a four-hole rotor. Absorbance scans were taken at 280 nm versus radial 
location during centrifugation at 30,000 r.p.m. The differential sedimentation 
coefficients, c(s), frictional coefficients and molecular mass were calculated using 
SEDFIT software. 

Molecular dynamics simulation. Molecular dynamics simulations were based on 
the crystal structure of the SdeA-Ub-NADH complex described in this manuscript. 
The simulations were carried out under two conditions: (1) the NAD*t-bound com- 
plex structure (the SdeA mART-Ub-NADH structure with the NADH displaced 
by NAD‘) and (2) the intermediate-bound complex (the SdeA mART-Ub-NADH 
structure with the nicotinamide group of the ligand removed). Hydrogen atoms were 
added with the optimal hydrogen-bonding networks and side-chain protonation states 
determined at pH 7.0 by PROPKA*™”. The protein-chain termini of SdeA were 
capped with neutral acetyl and methylamide groups. Each system was solvated in a 
cubic water box with a 10 A buffer and neutralizing counter ions were added. To mimic 
experimental assay conditions, a 0.15 M NaC] salt bath was introduced. We used the 
OPLS-AA 2005 force field*! parameter set for the protein and ligand, and TIP3P 
model for water. The parameters for ligands (NAD* and intermediate) were generated 
from the LigParGen*? web server, which applied BOSS software* to assign the bonded 
and van der Waals parameters by analogy to the existing atom types. The charges 
were calculated and assigned by a semi-empirical AM1™ calculation using CM1A 
charge model°*°, Simulations were performed with the Desmond software pack- 
age*’. Prepared systems were first minimized using 5,000 steps of a steepest descent 
algorithm, then equilibrated as follows: the system was heated from 0 to 310 K in the 
isothermal-isobaric (NpT) ensemble over 100 ps with harmonic restraints of 
10.0 kcal-mol!-A~? on heavy atoms of protein and ligand, and initial velocities 
sampled from the Boltzmann distribution. Further equilibration was per- 
formed at 310 K with harmonic restraints on the protein and ligand starting at 
10.0 kcal-mol~!.A~? and reduced by 1.0 kcal-mol~!-A~? in a stepwise fashion every 
2 ns, for a total of 20 ns of additional restrained equilibration. Production runs were 
then made for 200 ns duration in the NpT ensemble. The M-SHAKE algorithm** was 
applied to constrain all bonds involving hydrogen atoms with a time step of 2 fs. The 
short-range electrostatic and Lennard-Jones interactions were cut off at 9 A. Long- 
range electrostatic interactions were computed by the particle mesh Ewald method”. 
Yeast toxicity assay. Yeast strain W303 was used for all the experiments. Yeast 
was grown at 30°C in YPD for transformation or appropriate selective medium 
lacking uracil and containing either 2% glucose or galactose as a carbon source. For 
expression in yeasts, genes were cloned into pYES2 vector containing the galactose- 
inducible promoter. The integrity of all constructs was verified by sequencing anal- 
ysis. For each construct, about 1 xg of plasmid DNA was used to transform yeast 
cells using the standard lithium acetate method. For yeast toxicity experiments, 
W303 strain cells carrying the defined plasmids were grown overnight in synthetic 
media lacking uracil and containing 2% glucose. The cells were collected, washed 
once with sterile water and resuspended in sterile water to an OD¢00 nm Of 1.0, 0.1 
or 0.01. Then 5-1] aliquots of this suspension were spotted onto solid synthetic 
defined medium lacking uracil and containing either 2% glucose or galactose for 
protein expression. Plates were grown at 30°C and images were acquired after 
2 days of growth. 

Preparation of the ADPR-Ub. For producing ADPR-Ub, 0.1 4M ANC SdeA¥?77A 
was used in the reaction mixture after tests of enzyme concentrations for the best 
conversion. In brief, 0.1 1M ANC SdeA#?’74 was incubated with 0.4 mM NAD+ 
and 35 uM Ub at 37°C for 3 h, after which the reaction mixture was concentrated 
and loaded onto the Superdex-75 column (GE Helthcare). The peak fractions of 
Ub were pooled and subjected to mass spectrometric analysis to verify that all the 
Ubs were in the ADPR-Ub form. ADPR-Ub could be prepared to 100% purity by 
this method. They were then stored at —80°C until use. 

In vitro ubiquitin-modification and RAB33B-ubiquitination assays. For 
auto-ubiquitination of SdeA (231-1190) experiments, 6 1M of purified GST-tagged 
ubiquitin was incubated with 0.9 1M SdeA(231-1190) and RAB33B at 37°C for 
1h in the presence or absence of 0.1 mM NAD* ina buffer containing 50 mM 
Tris pH 7.5, 1 mM DTT. After the reaction, the samples were analysed using SDS- 
PAGE and Coomassie staining. For the RAB33B-ubiquitination experiments, to 
test whether SdeA(231-1190) was functioning normally, 0.6 |1M SdeA, 0.9 1M 
wild-type SdeA(231-1190), SdeA(231—1190)#8°°A/£8624 or SdeA(231-1190)12774 
were incubated with 1 mM NAD* and 35 ,.M Ub in the presence or absence of 


9.5 {1M His-RAB33B at 37°C for 5 min. The samples were then analysed using 
Tricine gel, Coomassie staining and immunoblotting with anti-His and anti- Ub 
antibodies. For the ubiquitination experiments of wild-type ANC SdeA and the 
mutants, 1 14M ANC SdeA (wild type and mutants) was incubated with 0.1 mM 
NAD*, 35M Ub and 7.4 .M His-RAB33B at 37°C for 30 min. For the NADH- 
inhibition experiment, 1, 2 or 5 mM NADH was added to the reaction mixture, in 
which the other components were the same concentrations as above. The samples 
were then analysed using SDS-PAGE. For phospho-specific staining of PR-Ub, 
Pro-Q Diamond stain was used according to the manufacturer's instructions. 
For immunoblotting analysis, primary antibodies were used: anti-His (1:5,000, 
Transgene, HT501) and anti-Ub (1:500, Santa Cruz Biotechnology, sc-8017). For 
the time kinetics of the ubiquitination reactions, 1.09 tM ANC SdeA (wild type 
and mutants) or combinations of its fragments were incubated with 1 mM NAD‘, 
35 uM Ub, 9.5 1M His-RAB33B at 37°C for indicated times. The samples were 
stained with Coomassie and Pro-Q diamond phosphoprotein stain. 

Top-down LC-MS analysis of modified Ub and Ub-like proteins. Wild-type Ub 
was purchased from Boston Biochem (U-100H), and the untagged Ub mutants 
were purified from E. coli cells. The ubiquitination reactions were performed in a 
100-j1 system, in which 35 {1M Ub, Ub mutants or Ub-like proteins was incubated 
with specific SdeA fragments or mutants for 2 h in a buffer containing 50 mM 
NaCl, 50 mM Tris pH 7.5 and 2 mM DTT. The reaction mixtures were then run 
through 30-kDa molecular mass cut-off filters to obtain the modified Ub or other 
Ub-like proteins below the filter. The proteins were then subjected to LC-MS 
analysis. A linear ion-trap mass spectrometer (LTQ Velos Pro, Thermo Scientific) 
was used for total molecular mass analyses. Liquid chromatography separation 
was carried out on an EASY-nLC 1200 system (Thermo Scientific). The capillary 
column (75 jum x 150 mm) with a laser-pulled electrospray tip was home-packed 
with 4-j1m, 100 A Magic C4AQ silica-based particles (Michrom BioResources). 
The mobile phase consisted of solvent A (97% HO, 3% ACN, and 0.1% FA) and 
solvent B (20% H20, 80% ACN and 0.1% FA). The following gradient was used: 
solvent B was started at 20% for 3 min and then raised to 50% in 20 min; subse- 
quently, solvent B was rapidly increased to 70% in 2 min and maintained for 20 min 
before 100% solvent A was used for column equilibration. Eluted peptides from 
the capillary column were electrosprayed directly onto the mass spectrometer for 
mass-spectrometry analyses. One full mass-spectrometry scan (m/z 600-1,500) 
was acquired. 

In vitro GST pull-down assay. To detect whether SdeA could bind IcmS alone or 
its complexes, GST-fused full-length SdeA protein was preloaded on glutathione 
resins and then incubated with IcmS, IcmS-IcmW, IcmS-IcmW-DotLc or IcmS- 
IcmW-DotLc-LvgA at 18°C for 1 h. The samples bound on glutathione resins 
were washed three times with the washing buffer (50 mM Tris, pH 7.5, 50 mM 
NaCl) and then analysed by SDS-PAGE and Coomassie blue staining. To detect the 
region of SdeA responsible for IcmS binding, GST-fused full-length IcmS protein 
was preloaded on glutathione resins and then incubated with different fragments 
of SdeA protein at 18°C for 1 h. The samples were treated as stated above. 
Gel-filtration binding assay. The SdeA(1191-1350) and IcmS-IcmW-DotLc 
complex purified as described above were subjected to gel-filtration analysis 
(Superdex 200, GE Healthcare). They were mixed at a molar ratio of about 1:1 
and incubated at 18°C for 4h before gel-filtration analysis in buffer containing 10 
mM Tris pH 8.0, 100 mM NaCl. Samples from relevant fractions were applied to 
SDS-PAGE and visualized by Coomassie blue staining. 

Statistics and reproducibility. No statistical methods were used to predetermine 
sample size. All of the in vitro assays presented in this work were repeated at least 
three times with similar results. The experiments were not randomized and the 
investigators were not blinded to allocation during experiments and outcome 
assessment. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. Coordinates and structure factors for the complexes have 
been deposited in the Protein Data Bank (PDB) under accessions: 5YIM, 
(SdeA(231-1190); SYIK, SdeA(231-1190)-Ub; and 5YIJ, SdeA(231-1190)- 
Ub-NADH. Uncropped versions of all gels are displayed in Supplementary 
Fig. 1. All other data are available from the corresponding author upon rea- 
sonable request. 
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and over 50% homology are shaded in dark blue, pink and light blue, 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


=F ana ~ > SdeAFL 
SdeA 231-1190 
SdeA (E/A)23*-1190 


His-hRab33b 


NAD 
Ub 


Ub-hRab33b 
His-hRab33b 


(Coomassie) 


Ub 


IB: His 


IB: Ub 


c (s) 


SdeA (H277A)231-1190 


SdeA23-1190 + ++ 
Rab33b + + + 
GST-Ub t+ + - 


+o - + 


Extended Data Fig. 2 | See next page for caption. 


GST-Ub-SdeA 
SdeA 
GST-Ub-Rab33b 
(Coomassie) 


GST-Ub 


GST 
Rab33b 


GST-Ub-SdeA 


GST-Ub-Rab33b 
IB: Ub 


GST-Ub 


GST-Ub 
(Pro-Q Diamond 
phosphoprotein stain) 


e300, 
2504 
2004 


1505 


mAU 


100- 


ARTICLE 


SdeA, NAD*, Rab33b, Ub 
kDMW 0 1 2 = 5 NADH (mM) 


199 pod i sis SdeA231-1190 
60 
4 
35 : 
__Ub-Rab33b 
Rab33b 


Sdea2""1% 
—— Marker 


158 kD 


10 45 mL 
11.5 12 12513 mL 
140kD 


100kD 


80kD 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


ARTICLE 


Extended Data Fig. 2 | SdeA(231-1190) is an active monomer in 
solution. a, SdeA, wild-type SdeA (231-1190), SdeA(231-1190)#8604/E8624 
or SdeA(231-1190)#?774 were incubated with NAD* and Ub in the 
presence or absence of His-RAB33B. Ubiquitinated His-RAB33B were 
analysed using tricine gels, Coomassie staining and immunoblotting 

with anti-His and anti-Ub antibodies. b, SdeA(231-1190) and 
RAB33B(15-202) were incubated with GST-Ub and NAD‘, and self- 
ubiquitinated SdeA was detected by Coomassie staining, immunoblotting 
with anti-Ub antibodies, and Pro-Q diamond phosphoprotein staining. 

c, SdeA(231-1190), NAD* and RAB33B were incubated with 0, 1, 2 or 5 mM 
NADH. The ubiquitination reactions were analysed using tricine gel and 
Coomassie staining. d, Analytical ultracentrifugation results showed that 
SdeA(231-1190) is a monomer. Analytical ultracentrifugation analysis 
yielded a sedimentation coefficient of 5.13 S, and a molecular mass of 


approximately 106 kDa. The buffer is 10 mM Tris pH 8.0, 200 mM NaCl 
and 5 mM DTT. e, Gel filtration profile of the SdeA(231-1190) protein 
and the molecular markers on Superdex-75 column (GE Healthcare) are 
shown. The sizes of the molecular markers are marked on top of the peaks. 
The samples of SdeA(231-1190) collected from the Superdex-75 column 
were run on SDS-PAGE gels and detected by Coomassie staining. 

a-e, Similar results were obtained in three independent experiments. 

a-c, e, Uncropped blots and gel images are shown in Supplementary Fig. 1. 
f, Two views of the superimposition of the structures of the two molecules 
in the asymmetric unit, coloured in different colours. g, Structure of 

the CTD region in the crystallized protein can be divided into two parts 
(left and right). The « helices are numbered according to their orders in 
the residue region from 908 to 1190. h, Topological diagram of the CTD 
region shown in g. The N and C termini of the pCTD domain are labelled. 
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Extended Data Fig. 3 | Interactions between SdeA mART and SdeA 
PDE are essential for the activity of SdeA MART. a, Overview of the 
interactions between SdeA mART and SdeA PDE. SdeA is coloured as in 


Fig. 1b. The major interaction region between the two domains is outlined. 


b, An expanded view of the region outlined in a. Interaction residues are 
shown in stick representation and the red dashed lines represent polar 
interactions. The plug loop in SdeA mART is indicated. c, The interaction 
between SdeA mART and SdeA PDE. SdeA mART and SdeA PDE are 
shown in cartoon and surface electrostatic models, respectively. d, A view 
of the interaction from c rotated by 180 degrees. In this view, SdeA mART 
and SdeA PDE are shown as surface electrostatic and cartoon models, 
respectively. e, Testing the ability of SdeA PDE to process ADPR-Ub into 


PR-Ub. SdeA(231-588), wild-type ANC SdeA or the H277A mutant 
were incubated with ADPR-Ub and RAB33B for 30 min. The samples 
were stained with Coomassie and Pro-Q diamond phosphoprotein stain. 
f, Testing the importance of domain interaction for the activity of SdeA 
mART. Various SdeA segments, and mixtures of SdeA(231-588) and 
SdeA(597-935) or SdeA(193-935)4?774, were incubated with RAB33B, 
NAD* and Ub for 30 min. The samples were analysed using Coomassie 
staining, immunoblotting with anti- Ub antibodies and Pro-Q diamond 
phosphoprotein staining. e, f, Similar results were obtained in three 
independent experiments. Uncropped blots and gel images are shown in 
Supplementary Fig. 1. 
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Extended Data Fig. 4 | SdeA mART exhibits novel conformations of 
ARTT and PN loops. a, Superimposition of SdeA mART (green) and 
HopU1 (pink) from Pseudomonas syringae (PDB: 3U0J). The ARTT loop 
is indicated. The r.m.s.d. value is indicated beside the PDB code (panels b 
and care arranged in the same way). b, Superimposition of SdeA 

mART (green) and ADP-ribosyltransferase Vis (blue) (PDB: 4XZK). 

c, Superimposition of SdeA mART (green) and XopAI from Xanthomonas 
axonopodis pv. citri (cyan) (PDB: 4ELN). d, Superimposition of SdeA 
mART structure (green) and the three other structures from a-c. e, f, Mass 
spectra of the samples in Fig. 2f. The sample name and their molecular 
masses are indicated in the figures. g, h, Different fragments and different 


combinations of SdeA proteins were incubated with Ub, RAB33B and 
NAD¢? at 37°C for the indicated amounts of time. The samples were 
analysed using Coomassie staining and Pro-Q diamond phosphoprotein 
staining. i, Testing the ubiquitination ability of the catalytic core. 0.09 or 
0.9 uM ANC SdeA or SdeA(193-935) was incubated with or without Ub 
and NAD* for the indicated amounts of time. The samples were analysed 
using Coomassie staining, immunoblotting with anti-Ub antibodies 

and Pro-Q diamond phosphoprotein staining. e-i, Similar results were 
obtained in three independent experiments. g-i, Uncropped blots and gel 
images are shown in Supplementary Fig. 1. 
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Extended Data Fig. 5 | SdeA(231-1190) binds three Ub molecules. 

a, Overall structure of the SdeA(231-1190)-Ub complex. SdeA is coloured 
as in Fig. 1b. The three Ub molecules are coloured in magenta and labelled 
as Ub1-3 according to the order of their binding region in SdeA(231- 
1190). Q935 and $998, which are two common C termini of the clones 
used in this study, are shown as spheres. b, Ub binding causes prominent 
structural changes of SdeA. The SdeA-Ub complex structure is shown 

as in a, and the apo-SdeA structure is coloured in pink. The N-terminal 


K48-linked diUb, PDB: 2KDF 


proximal distal 


M1-linked diUb, PDB: 5WQ4 
distal 


region of SdeA pCTD which undergoes pronounced conformational 
changes is outlined with a circle. c, d, Expanded views of the two Ub 
binding sites in SdeA pCTD. The proteins are coloured as in a. Red dashed 
lines indicate polar interactions. e-h, Structural alignments of the Ub 
molecule (magenta) in the SdeA mART-Ub complex with the proximal 
(yellow) and distal (orange) Ubs of the K11- (e), K48- (f), K63- (g) and 
M1-linked (h) diubiquitins. The two R42 residues in each of the four 
diUbs are shown in stick representation. 


proximal 
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Extended Data Fig. 6 | Specific recognition of Ub by SdeA mART. 

a, The interaction between SdeA mART and Ub. SdeA mART is shown 

as a surface electrostatic potential model and Ub is in magenta cartoon 
representation. The R42, R72 and R74 residues of Ub are shown in stick 
representation. b, Ub®”” and Ub®” are bound in the negatively charged 
groove of SdeA mART. The front part of SdeA mART is cut away to reveal 
the inner surface. c, Superimposition of SUMO1 (PDB: 1WM3), NEDD8 
(PDB: INDD) and Ub in the SdeA mART-Ub complex. The conserved Arg 
residues in Ub, SUMO1 and NEDD8 are shown in stick representation, out 
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of which Ub®?, Ub8”?, and Ub®”4 are marked. The polar interactions with 
Ub®” and Ub®” are shown as red dashed lines. d, The purified SUMO and 
NEDD8 proteins were incubated with SdeA(193-935)4?774 and NADt 
under the conditions stated in the ‘Top-down LC-MS analysis of modified 
Ub and Ub-like proteins’ section of the Methods. Mass spectra of the 
samples are also shown. The sample names and their molecular masses are 
indicated in the figures. Similar results were obtained in three independent 
experiments. 
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Extended Data Fig. 7 | See next page for caption. 
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Extended Data Fig. 7 | Molecular dynamics simulations indicate the 
movements of the side chains of Ub®”? and Ub®”. a-d, Wild-type 

ANC SdeA and indicated mutants were incubated with Ub, RAB33B 

and NAD* at 37°C for the indicated amounts of time. The samples were 
analysed using Coomassie staining and Pro-Q diamond phosphoprotein 
staining. Similar results were obtained in three independent experiments. 
Uncropped blots and gel images are shown in Supplementary Fig. 1. 

e, The structure of the SdeA mART-Ub-NADH complex. SdeA mART 

is shown as an electrostatic surface potential model. White, blue and red 
indicate neutral, positive and negative surfaces, respectively. Shown in 
green mesh is the 2F,—F, electron density map contoured at 1a around 
the NADH molecule. f, Galactose-inducible pYES2 plasmids containing 
wild-type ANC SdeA or the mutants were transformed into yeast W303 
strain. Five microlitres of cells in three tenfold serial dilutions were spotted 
on both glucose- and galactose-containing plates lacking uracil for two 
days before image acquisition. g, Purified ADPR-Ub proteins were treated 
with or without wild-type ANC SdeA. The samples were analysed using 


Coomassie staining and Pro-Q diamond phosphoprotein staining. 

h, Purified ADPR-Ub protein was subjected to top-down LC-MS analysis. 
The results indicated 100% ADPR-Ub. i, Wild-type ANC SdeA or other 
mutants were incubated with RAB33B and the prepared ADPR-Ub verified 
in g and h. The samples were analysed using SDS-PAGE, with Coomassie 
staining and Pro-Q diamond phosphoprotein staining. f-i, Similar results 
were obtained in three independent experiments. g, i, Uncropped blots 
and gel images are shown in Supplementary Fig. 1. j, k, The time series for 
the r.m.s.d. of the non-hydrogen atoms of the protein-ligand complex (j) 
and the ligand (k) in the SdeA mART-Ub-intermediate and SdeA mART- 
Ub-NAD* systems during molecular dynamics simulations. These two 
plots indicate that both systems have reached equilibrium during the 200- 
ns simulations. 1, m, The time series for the shortest distance between the 
NH1/2 atom of Ub®” and C1D of the ligand (1) and the distance between 
the NH1/2 atom of Ub®” and C1D of the ligand (m) in the two systems 
during molecular dynamics simulations. 
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Extended Data Fig. 8 | See next page for caption. 
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Extended Data Fig. 8 | The overall shape of SdeA and the function of 
SdeA CTD. a, Superimposition of various Ub structures (PDB codes: 
5M93 (orange), 1UBQ (pink), 5CRA (chain C, cyan), 3ZLZ (chain B, 
yellow) and 4BOZ (chain B, grey)) onto the SdeA mART-Ub-NADH 
structure with R42 residues of all the Ub molecules shown in stick 
representation. SdeA mART and Ub from the SdeA mART-Ub-NADH 
complex are shown in green and magenta, respectively. b, In vitro GST 
pull-down assays to detect the interactions of SdeA with IcmS or its 
complexes. GST-fused SdeA protein was incubated with IcmS, the IcmS- 
IcmW complex, the IcmS-IcmW-DotLc (residues 656-783 of DotL) 
ternary complex or the IcmS-IcmW-DotLc-—LvgA quaternary complex. 
The protein samples bound to glutathione resins were washed three 
times and analysed by SDS-PAGE and Coomassie blue staining. IcmS/W 
represents IcmS + IcmW. The band marked with an asterisk represents 


the degraded GST tag. Similar results were obtained in three independent 
experiments. Uncropped blots and gel images are shown in Supplementary 
Fig. 1. c, Experimental PDDFs (pair distance distribution function) for 
SdeA(231-1190), SdeA(1-1499), SdeA(1-1190) and SdeA(1092-1496). 

d, Overlay of the experimental scattering profiles (exp) from the four 
samples in the SAXS analysis with the back-calculated scattering profile 

of the crystal structure of SdeA(231-1190) (cal). e, Fitting the crystal 
structure of SdeA(231-1190) into the SAXS envelope of SdeA(231-1190). 
Two perpendicular views are shown. f, Superimposition of the SAXS 
envelopes of SdeA(231-1190) (coloured as in e) and SdeA(1-1190) (light 
magenta) with the crystal structure of SdeA(231-1190) fitted. g, SAXS 
envelopes of SdeA(1092-1496). h, Superimposition of the SAXS envelopes 
of SdeA(1-1190) (light magenta), SdeA(1092-1496) (cyan) and 
SdeA(1-1499) (wheat) with the crystal structure of SdeA(231-1190) fitted. 
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Extended Data Table 1 | Data collection and refinement statistics 


Data collection 
Space group 
Cell dimensions 
a, b,c (A) 
(°) 
Resolution (A) 
Reym or Rmmerge (Yo) 
ol 
Completeness (“%) 
Redundancy 
Refinement 
Resolution (A) 
No. reflections 
Rwork / Reiee 
No. atoms 
Protein 
Ligand/ion 
Water 
B factors 
Protein 
Ligand/ion 
Water 
R.m.s. deviations 
Bond lengths (A) 
Bond angles (°) 


SdeA 
(SYIM) 


C222) 


139.7, 295.4, 194.6 
90.00, 90.00, 90.00 
50-3.39 (3.52-3.39)" 
6.9 (95) 

14.8 (1.43) 

99.7 (99.7) 

3.8 (3.9) 


50.0-3.39 (3.52-3.39) 
55636 (5274) 
0.2510/0.2870 

14128 

14128 


0.004 
0.96 


For each structure one crystal was used. 
Values in parentheses are for the highest-resolution shell. 


SdeA-Ub 
(SYIK) 


C2 


108.9, 145.9, 104.1 
90.00, 104.46, 90.00 
50-3.10 (3.21-3.10) 
11.1 (76.8) 

16.0 (2.41) 

99.9 (100.0) 

5.3 (5.3) 


42.54-3.10 (3.21-3.10) 
25234 (1380) 
0.2230/0.2790 

9381 

9381 


0.008 
1.02 
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SdeA-Ub-NADH 
(SYI) 


C2 


107.8, 145.9, 103.4 
90.00, 103.82, 90.00 
50-3.18 (3.31-3.18) 
8.2 (79.1) 

25.9 (4.2) 

99.7 (100.0) 

6.5 (6.7) 


39.96-3.18 (3.30-3.18) 
24125 (1486) 
0.2261/0.2752 

9410 

9366 

44 


0.006 
0.93 
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Extended Data Table 2 | Data collection and structural parameters derived from SAXS experiments 


SdeA!1499 SdeA22!-1190 SdeA!-1190 SdeA 1092-1496 
Data Collection Parameters 
Facilities and parameters Settings and values 
Beam line 12ID-B ( APS, ANL) BL19U2 (SSRF) 
Wavelength (A) 0.8857 1.033 
Detector Pilatus 1M (SAXS) Pilatus 100K (SAXS) 
q range (A") 0.006-2.5 0.009-0.415 
Exposure time (s) 60 60 
Concentration range ( mg/ml) 0.75-3 ---- 
Temperature (K) 300 300 
Structural Parameters 
Rg from guinier fitting (A)* 54.96+1.66 41.6641.15 45.40+1.17 43.50+0.35 
Rg from GNOM (A)* 55.83+40.28 42.58+0.27 47.34+0.21 44.97+0.14 
Dmax (A) 203 148 162 150 
Vporoaftom PRIMUS (A?) 330.28x10° 185.69x 108 209.8210? 81.73x10° 
V. (A?) 1098.56 728.42 818.11 487.69 
aMWwrred (kDa) 170.1 108.6 133.5 46.4 
’M Ws! (kDa) 175.60 101.23 114.85 42.94 
“MW**** (kDa) 206.43 116.06 131.14 51.08 
NSD of DAMMIN Models* 0.849+0.025 0.705+0.021 0.763+0.020 0.804+0.019 
Software Employed 
Primary Data Processing Igor Pro/PRIMUS 
P(r) Function GNOM 
Ab initio Shape Analysis DAMMIN 
SAXS Profile Computation CRYSOL 
Superposition and averaging SUPCOMB/DAMAVER 


The predicted molecular mass (MWP'®?) was calculated from the primary sequences of components. 

‘The molecular mass from SAXS1 (MWS?*S!) was calculated using the previously developed R,/V- power law43, 
‘The molecular mass from SAXS2 (MWS?%S2) was calculated from Porod volume using AUTOPOROD®°. 

*Data are mean + s.d. from fitting. 
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Spatiotemporal regulation of liquid-like 
condensates in epigenetic inheritance 


Gang Wan!*, Brandon D. Fields!?*, George Spracklin!, Aditi Shukla!, Carolyn M. Phillips? & Scott Kennedy! 


Non-membrane-bound organelles such as nucleoli, processing bodies, Cajal bodies and germ granules form by the 
spontaneous self-assembly of specific proteins and RNAs. How these biomolecular condensates form and interact is 
poorly understood. Here we identify two proteins, ZNFX-1 and WAGO-4, that localize to Caenorhabditis elegans germ 
granules (P granules) in early germline blastomeres. Later in germline development, ZNFX-1 and WAGO-4 separate from 
P granules to define an independent liquid-like condensate that we term the Z granule. In adult germ cells, Z granules 
assemble into ordered tri- condensate assemblages with P granules and Mutator foci, which we term PZM granules. 
Finally, we show that one biological function of ZNFX-1 and WAGO-4 is to interact with silencing RNAs in the C. elegans 
germline to direct transgenerational epigenetic inheritance. We speculate that the temporal and spatial ordering of liquid 
droplet organelles may help cells to organize and coordinate the complex RNA processing pathways that underlie gene- 
regulatory systems, such as RNA-directed transgenerational epigenetic inheritance. 


Epigenetic information can be inherited for several generations (trans- 
generational epigenetic inheritance, TEI)'”. Non-coding RNAs have 
emerged as important mediators of TEI (RNA-directed TEI), although 
the mechanism(s) by which RNA mediates TEI remains poorly under- 
stood. In many eukaryotes, double-stranded RNAs (dsRNAs) silence 
other cellular RNAs that exhibited sequence complementarity to trigger 
dsRNAs; a process termed RNA interference (RNAi)*. In C. elegans, 
RNAiis heritable: distant progeny of animals exposed to dsRNAs con- 
tinue to silence complementary RNAs in the absence of further dsRNA 
exposure (RNAi inheritance)*®. To further our understanding of RNA- 
directed TEI, we conducted a genetic screen to identify factors required 
for RNA inheritance (Extended Data Fig. 1). Our screen identified 37 
mutations that disrupted RNAi inheritance. We subjected DNA from 
these 37 mutant strains to whole-genome sequencing and identified 
four independent mutations in the gene zk 1067.2 (Fig. 1a). To confirm 
that zk 1067.2 is required for RNAi inheritance, we tested two additional 
alleles of zk 1067.2 (gk458570 and gg561) for defects in RNAi inher- 
itance. gk458570 and gg561 animals responded normally to dsRNA 
treatment; however, the progeny of these mutant animals were largely 
unable to inherit gene silencing (Fig. 1b and Extended Data Fig. 2). We 
conclude that zk 1067.2 is required for RNAi inheritance. 


ZNFX-1 is required for TEI 
Sequence analysis showed that ZK1067.2 is predicted to encode a 
2,443-amino acid protein that contains a superfamily one (SF1) RNA 
helicase domain and a zinc-finger domain (Fig. 1a). A single putative 
orthologue of ZK1067.2 was found in most eukaryotic genomes. Fungal 
orthologues have been linked to RNAi pathways in Schizosaccharomyces 
pombe and Neurospora crassa*®”®. Homology between ZK1067.2 and 
its mammalian orthologue ZNFX1 extend to a zinc-finger domain 
not present in fungal orthologues. We conclude that ZK1067.2 is a 
conserved protein involved in RNAi-mediated gene silencing in many 
eukaryotes. Hereafter, we refer to ZK1067.2 as ZNFX-1. 

To begin to understand the function of ZNFX-1 during RNAi inher- 
itance, we used CRISPR-Cas9 to insert a 3xflag::gfp epitope imme- 
diately upstream of the znfx-1 start codon. Note, CRISPR-mediated 


gene conversion was used throughout this work. Tagged loci were 
expressed near wild-type levels and resultant fusion proteins were func- 
tional unless otherwise indicated (Extended Data Fig. 3). We observed 
GFP::ZNFX-1 expression in the adult germline as well as in develop- 
ing germ cells during all stages of embryonic and larval development 
(Fig. 1c). No GFP::ZNFX-1 expression was observed in somatic tissues. 
After fertilization, C. elegans zygotes undergo a series of asymmetric 
cell divisions in which germline determinants segregate with germline 
blastomeres. During embryonic development, ZNFX-1 foci were con- 
centrated in, and segregated with, the germline blastomeres (Fig. 1c and 
see below). In adult germ cells, GFP::ZNFX-1 was concentrated in foci 
that were distributed in a perinuclear pattern around nuclei (Fig. 1d). 
We conclude that znfx-1 encodes a germline-expressed protein that 
segregates with the germline and localizes to perinuclear foci in adult 
germ cells. 

Treatment of animals with oma-1 dsRNA silences the oma-1 gene 
for several generations”®. To address when ZNFX-1 acts to promote 
RNAi inheritance, we used quantitative reverse transcription PCR 
(qRT-PCR) to measure oma-1 mRNA and precursor mRNA (pre- 
mRNA) levels in znfx-1(—) animals exposed to oma-1 RNAi, as well 
as in the progeny of these animals. znfx-1(—) animals responded nor- 
mally to oma-1 RNAi; however, their progeny failed to inherit silencing, 
suggesting that ZNFX-1 acts during the inheritance phase of RNAi 
(Fig. le). During RNAi inheritance, short interfering RNAs (siRNAs) 
that target genes undergoing RNAi silencing are expressed for several 
generations®. In znfx-1(—) animals exposed directly to oma-1 dsRNA, 
oma-1 siRNAs were produced at wild-type levels; however, the prog- 
eny of these mutant animals failed to express oma-1 siRNAs (Fig. 1f). 
Three additional genetic and biochemical analyses supported the idea 
that ZNFX-1 acts during the inheritance phase of RNAi (Extended 
Data Fig. 4). These data establish that ZNFX-1 is a dedicated RNAi 
inheritance factor. 


WAGO-4 acts with ZNFX-1 to direct TEI 
The C. elegans genome encodes approximately 27 Argonaute (AGO) 
proteins. The molecular function of many of these AGOs remains 
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Fig. 1 | ZNFX-1 is a conserved RNA helicase required for RNAi 
inheritance in C. elegans. a, znfx-1 alleles are indicated. aa, amino acids. 
b, Animals expressing a pie-1::gfp::h2b transgene’” were exposed to 

gfp dsRNA. F, progeny were grown in the absence of dsRNA, and GFP 
expression in oocytes was visualized by fluorescence microscopy using 

a 63x objective. The percentage of animals that express pie-1::gfp::h2b 

is shown (n= 6 biologically independent samples for wild type (WT), 

n=3 for rde-1, znfx-1 and wago-4, each n > 30 animals). Po, parental 
generation. c, Fluorescence micrograph of gfp::znfx-1 animals. Images 

are representative of more than three animals visualized at each life stage 
using a 60x objective. d, Pachytene germ cells of animals that express 
GFP::ZNFX-1 and the chromatin marker mCherry::HIS-58. Image is 
representative of three animals, visualized using a 60 x objective. e, Wild- 
type, hrde-1(tm1200) and znfx-1(gg561) animals were exposed to oma-1 
dsRNA. Total RNA from RNAi (Po) and inheriting (F|) generations was 
isolated. RNA was quantified by qRT-PCR using primers 5’ to the site of 
RNAi and data were normalized to eft-3 pre-mRNA. n = 4 biologically 
independent samples. Data are mean + s.d. Note that independent mRNA 
and pre-mRNA primer sets gave similar results (data not shown). f, siRNA 
libraries (see Methods) were prepared from wild-type, znfx-1(gg561) and 
wago-4(tm1019) animals exposed to oma-1 dsRNA (Po) and progeny (F)). 
Antisense reads mapping to oma-1 locus are shown. Red line indicates 
region of oma-1 locus targeted by dsRNA. Reads counts were normalized 
to total number of sequenced reads (n = 2 biologically independent 
samples). 


unknown. Two of the mutant strains identified by our genetic screen 
harboured mutations in the AGO-encoding gene wago-4. To confirm 
WAGO-4 is required for RNAi inheritance, we tested two additional 
wago-4 deletion alleles (tm1019 and tm2401) for RNAi inheritance 
defects. Both alleles exhibited RNAi inheritance defects (Fig. 1b and 
Extended Data Fig. 5). Thus, like ZNFX-1, WAGO-4 is required 
for RNAi inheritance. Furthermore, when we appended a gfp tag 
to the wago-4 locus, we observed that, like ZNFX-1, WAGO-4 is a 
germline-expressed protein that segregates with the P lineage blas- 
tomeres and localizes to perinuclear foci (Extended Data Fig. 5). 
For unknown reasons, our GFP::WAGO-4 fusion protein was fully 
functional for RNAi inheritance in some RNAi inheritance assays 
but only partially functional in other assays (Extended Data Fig. 3). 
TagRFP::ZNFX-1 and GFP::WAGO-4 colocalized in germ cells, sug- 
gesting that WAGO-4 and ZNFX-1 may act together to promote RNAi 
inheritance (Fig. 2a). Three additional lines of evidence support this 
idea. First, Flag-tagged WAGO-4 (3 x Flag:: WAGO-4) co-precipitates 
with haemagglutinin-tagged ZNFX-1 (HA::ZNFX-1), but not with 
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Fig. 2 | ZNFX-1 and WAGO-4 act cooperatively to drive RNAi 
inheritance. a, Fluorescent micrographs of pachytene germ cells that 
express GFP::WAGO-4 and TagRFP::ZNFX-1. Image is representative 

of more than three animals. b, Co-immunoprecipitation analysis of 
HA::ZNFX-1 and Flag::WAGO-4. HA::HRDE-1 is a negative control. n= 1 
for wild-type and HA::HRDE-1, n=3 for others; n denotes independent 
experiments. c, Animals of indicated genotypes were shifted to growth at 
25°C and progeny were counted for five generations. Data are mean + s.d. 
of n = 3 biologically independent samples. d, Top, Flag::ZNFX-1 was 
immunoprecipitated in RNAi generation from animals treated with or 
without oma-1 dsRNA. Co-precipitating RNA was subjected to RT-PCR 
to quantify oma-1 mRNA co-precipitating with ZNFX-1 in wild-type or 
wago-4(tm1019) animals. Note that ZNFX-1 also binds RNAi-targeted 
RNAs in inheriting generations (data not shown). ZNFX-1 Ahelicase 
contains a 1,487 base-pair (bp) in-frame deletion of the ZNFX-1 helicase 
domain. gid-2 is a control mRNA. n= 10 for wild type, n =4 for others; 

n denotes biologically independent samples. Data are mean + s.d. Bottom, 
western blot of immunoprecipitated ZNFX-1 from one RNA precipitate 
replicate shown in the top panel. Two unrelated lanes were removed from 
this image (see Supplementary Fig. 1). 


a haemagglutinin-tagged negative control protein (HA::HRDE-1), 
suggesting a physical interaction between the two proteins (Fig. 2b). 
Second, wago-4 mutant animals behaved like znfx-1 mutant animals in 
molecular assays of RNAi inheritance (Fig. 1f). Third, znfx-1 and wago- 
4 animals share a pleiotropic phenotype: both mutant animals exhibited 
a temperature-sensitive mortal germline (Mrt) phenotype, in which 
mutant animals became sterile several generations after populations 
were shifted to growth at a higher temperature (25°C) (Fig. 2c). Taken 
together, these data show that WAGO-4 functions with ZNFX-1 to 
transmit RNA-based epigenetic information across generations. 

How WAGO-4 and ZNFX-1 promote RNAi inheritance is unclear. 
The closest homologue of ZNFX-1 is SMG-2 (also known as UPF1), 
which marks mRNAs containing premature termination codons’. We 
wondered whether, by analogy, ZNFX-1 might bind and mark mRNAs 
encoded by genes undergoing heritable gene silencing. To test this idea, 
we subjected animals expressing 3 x Flag::ZNFX-1 to oma-1 RNAi, 
immunoprecipitated 3 x Flag::ZNFX-1, and used qRT-PCR to deter- 
mine whether oma-1 RNAi caused ZNFX-1 to interact with oma-1 
mRNA. Indeed, oma-1 RNAi caused ZNFX-1 to co-precipitate with 
oma-1 mRNA (Fig. 2d). The following three lines of evidence show that 
the interaction of ZNFX-1 with TEI-related RNAs is a sequence-spe- 
cific event directed by the RNAi machinery. First, RNAi that targets 
the lin-15b gene caused ZNFX-1 to interact with the lin-15b mRNA, 
but not the oma-1 mRNA (and vice versa), indicating that ZNFX-1 and 
mRNA interactions are sequence-specific (data not shown). Second, 
most RNA helicases bind RNA via their helicase domains. Deletion of 
the ZNFX-1 helicase domain did not affect ZNFX-1 expression but did 
prevent ZNFX-1 from interacting with oma-1 mRNA (Fig. 2d). Third, 
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Fig. 3 | ZNFX-1 and WAGO-4 separate from P granules to form new 
foci during germline development. a, b, Micrographs of four-cell 
embryos expressing indicated proteins are shown. P2 blastomeres are 
indicated by arrowheads. a, Magnifications of P granules are shown 

to the right. Images are representative of more than three animals. 

b, Genotypes are znfx-1(gg561), or meg-3(tm4259);meg-4(ax2026). Images 
are representative of more than three animals. c, d, Micrographs (c) and 
quantification (d) of colocalization between (see Methods) indicated 


in wago-4 mutant animals, ZNFX-1 failed to interact with the oma-1 
mRNA in response to oma-1 RNAi (Fig. 2d). We conclude that RNAi 
directs ZNFX-1 to interact with mRNAs undergoing heritable silencing 
and that WAGO-4 is required for this property of ZNFX-1. 


ZNFX-1 and WAGO-4 separate from P granules 

P granules are biomolecular condensates that, like ZNFX-1 and 
WAGO-4 foci, segregate with the germline blastomeres (Po—P4) dur- 
ing embryonic development®”. The low-complexity protein PGL-1 
marks P granules!°. GFP::ZNFX-1 and GFP::WAGO-4 colocalized 
with PGL-1::TagRFP in P|—P3 germline blastomeres, suggesting that 
ZNFX-1 and WAGO-4 are P granule factors (Fig. 3a). MEG-3 and 
MEG-4 are low-complexity domain proteins that are redundantly 
required for P granule formation in the P lineage!! (Fig. 3b). In meg- 
3/4(—) embryos, ZNFX-1 and WAGO-4 foci failed to segregate with the 
P lineage (Fig. 3b). Thus, in early P|-P3 germline blastomeres, ZNFX-1 
and WAGO-4 localize to P granules. 

At around the 100-cell stage of embryonic development, the P4 
blastomere divides to give rise to Z2 and Z3, which are the primor- 
dial germ cells of C. elegans. Notably, we found that GFP::ZNFX-1 no 
longer colocalized with PGL-1::TagRFP in Z2 and Z3 (Fig. 3c). Instead, 
GFP::ZNFX-1 appeared in foci that were adjacent to (see below), yet 
distinct from, PGL-1::TagRFP foci (Fig. 3c). Similar results were seen 
when antibodies were used to visualize PGL-1 and ZNFX-1, indicat- 
ing that failure to colocalize was not an artefact of GFP or TagRFP 
epitopes (Extended Data Fig. 6). Quantitative analyses showed that 
the degree to which ZNFX-1 and PGL-1 colocalized changed during 
development, with a transition from colocalized to non-colocalized 
occurring between the P3 and Z2/Z3 cells (Fig. 3d). The ZNFX-1 and 
WAGO-4 foci seen in Z2 and Z3 could form de novo or by the separa- 
tion of ZNFX-1, WAGO-4 and PGL-1 from within pre-existing foci. 
We favour the latter model, as time-lapse imaging in Z2 or Z3 cells 
captured what appeared to be ZNFX-1 and PGL-1 separation events 
(Fig. 3e, f). The separation of ZNFX-1 and WAGO-4 into discrete foci 
could be triggered by phase separation or by segregation of pre-existing 
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sub-structures into discrete areas!!. We conclude that, late in germline 
development, ZNFX-1 and WAGO-4 are concentrated in foci adjacent 
to P granules. 


Z granules are liquid-like condensates 

Liquid-like condensates are self-assembling cellular structures that 
form when specific proteins and RNAs undergo liquid-liquid phase 
transitions from surrounding cytoplasm. The ability of ZNFX-1 and 
WAGO-4 to separate from P granules suggests that ZNFX-1 and 
WAGO-4 foci may also be liquid-like condensates. Liquid-like conden- 
sates are typically spherical in shape and their internal constituents 
undergo rapid internal rearrangements!”'?. Consistent with the idea 
that ZNFX-1 foci are liquid-like condensates, we observed that during 
oocyte maturation, ZNFX-1 foci detached from nuclei and assumed 
spherical shapes (Extended Data Fig. 7). In addition, fluorescence 
recovery after photobleaching (FRAP) experiments showed that within 
ZNFX-1 foci, GFP::ZNEX-1 fluorescence recovered rapidly from 
bleaching (f = 8s), which is a rate similar to that reported for PGL-1 
FRAP in P granules® (Extended Data Fig. 7). Thus, ZNFX-1 and 
WAGO-4 foci (post Z2 or Z3) exhibit properties reminiscent of liq- 
uid-like condensates and, therefore, we refer to these foci as Z granules. 


Z granules assemble into tri- droplet structures 

C. elegans germ cells possess at least two other foci (processing bodies 
and Mutator foci) with properties similar to liquid-like condensates!*">. 
TagRFP::ZNFX-1 did not co-localize with MUT-16::GFP, which marks 
Mutator foci, nor did GFP::ZNFX-1 colocalize with mCherry::PATR-1 
or mRuby::DCAP-1, which mark processing bodies (Fig. 4a and 
Extended Data Fig. 7). Interestingly, although Z granules did not colo- 
calize with Mutator foci, the relative positions of these two foci were not 
random. Z granules were usually (89% of the time, n = 35) found closely 
apposed to (no empty space between fluorescence signals) a Mutator 
foci (Fig. 4a). Similarly, Z granules were usually (91% of the time, 
n= 35) found closely apposed to a P granule (Fig. 4a). Quantification 
of distances between surfaces and centres of fluorescence for the three 
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Fig. 4 | Z granules assemble into tri-condensate (PZM) structures 
with P granules and Mutator foci. a, Top, fluorescent micrographs of 

a pachytene germ-cell nucleus from animals expressing the indicated 
fluorescent proteins. Bottom, 3D renders of representative foci. Images 
are representative of more than three animals. b, Distance between the 
centres (left) and surfaces (right) of the spaces occupied by the indicated 
fluorescent proteins was calculated as described in the Methods. Data 
are mean + s.d. of 10 granules measured in 3 animals (30 total). Column 


foci supported the idea that Z granules localize adjacent to P granules 
and Mutator foci in adult germ cells (Fig. 4b). This analysis also showed 
that the distance between the surfaces of Z granules and P granules or 
Mutator foci (but not the distance between P granules and Mutator foci) 
lies within the diffraction limit of light, indicating that Z granules exist 
in very close proximity to, and may be in direct physical contact with 
P granules and Mutator foci (Fig. 4b). Note that although Z granules 
are intimately associated with P granules and Mutator foci in adult 
germ cells (and throughout most of germline development), they can 
exist independently. For instance, in the adult germline, Z granules 
remained visible at developmental time points when P granules were 
no longer present (Extended Data Fig. 7). Similarly, Z granules are 
present in developing germ cells at time points (that is, P blastomeres) 
when Mutator foci are not thought to be present/4, In addition, shear- 
ing force causes P granules in pachytene-stage germ cells to disengage 
from nuclei and flow through the germline syncytium’. After applying 
shearing force, we found that P granules flowed through the cytoplasm; 
however, Z granules remained largely static (Extended Data Fig. 7). 
Thus, Z granules can be separated from P granules and Mutator foci 
both developmentally and physically. We conclude that Z granules rep- 
resent an independent form of liquid-like condensate, which closely 
mirror P granules and Mutator foci in adult germ cells. 

Our data suggest that Z granules may localize between (bridge) 
P granules and Mutator foci. To test this idea, we imaged the three 
foci simultaneously using animals that express PGL-1::mCardinal, 
TagRFP::ZNEX-1 or MUT-16::GFP®. This analysis confirmed the idea 
that Z granules bridge P granules and Mutator foci (Fig. 4c). In 60% 
(52 out of 86) of cases, we observed a Z granule in close apposition to 
both a P granule and a Mutator foci, whereas in 92% (48 out of 52) of 
these cases, the Z granule lay between the other two foci. In no case 
(0 out of 52) did a P granule or a Mutator foci bridge the other two 
types of foci, respectively. Quantification of the distances between the 
centres and surfaces of Z granules, P granules and Mutator foci from 
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7) Tetraspeck beads 


MUT-16::GFP 


7 shows a chromatic shift associated with tetraspeck beads. Data in the 
right panel have been corrected for this shift. c, Fluorescent micrograph 
of indicated fluorescent proteins from pachytene germ cell. Image is 
representative of more than three animals (see Extended Data Fig. 8 

for quantifications). Scale bars, 2 1m (a, germ cells), 0.5 41m (a, single 
granules), and 0.25 1m (c). Positions of nuclear membrane, nuclear pores 
and PZM segments are not yet known. 


triple-marked images support the idea that Z granules act as a bridge 
between P granules and Mutator foci in adult germ cells (Extended Data 
Fig. 8). We conclude that P granules, Z granules and Mutator foci form 
tri-condensate assemblages (henceforth termed PZMs) in adult germ 
cells, and that the relative position of the three liquid-like condensates 
constituting the PZM is ordered. 

Following these observations, we sought to determine whether PZM 
assembly was required for RNA-directed TEI. Factors concentrated 
in Mutator foci*!”-!? and Z granules (this work) contribute to RNA- 
directed TEI. Furthermore, we find that several factors, known to be 
required for P granule assembly, are also needed for efficient RNAi 
inheritance (Extended Data Fig. 9). Thus, factors associated with all 
three segments of the PZM have now been linked to TEI. We also find 
that in mutant animals with defective P granules, Z granules become 
malformed and ZNFX-1 fails to bind TEI-related RNAs, hinting that 
the segments of the PZM may communicate with each other during 
TEI (Extended Data Fig. 9). The results are consistent with the idea that 
PZM assembly is important for RNA-directed TEI. 


Discussion 

We show here that the inheritance factors ZNFX-1 and WAGO-4 local- 
ize to a liquid-like condensate that we name the Z granule. Given that Z 
granules segregate with the germline, we speculate that one function of 
the Z granule is to concentrate and segregate silencing factors into the 
germline to promote RNA-based TEI. ZNFX-1 is a conserved RNA hel- 
icase that localizes to Z granules and marks RNAs produced from genes 
undergoing heritable silencing. The S. pombe orthologue of ZNFX-1 
is Hrr1, which forms a nuclear complex (termed the RDRC) with 
Argonaute and RdRP to amplify siRNA populations directing peri- 
centromeric heterochromatin’. We speculate that a C. elegans version 
of the RDRC acts in the cytoplasm where it promotes RNAi inheritance 
by: (1) binding inherited siRNAs (via WAGO-4), (2) marking mRNAs 
complementary to inherited siRNAs (via ZNFX-1), (3) using marked 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


mRNAs as templates for RdRP-based siRNA amplification, and (4) 
repeating this cycle each generation (Extended Data Fig. 10). Note, a 
related study suggests that the function of ZNFX-1 in RNA marking 
may involve positioning RdRP enzymes to prevent 5’ drift of AGOs 
targeting mRNAs”. 

ZNEX-1 and WAGO-4 separate from components of the P granule 
during early embryogenesis to form an independent liquid-like con- 
densate. Separation occurs at a developmental time that roughly corre- 
lates with the first association of P granules with nuclear pores and the 
advent of germline transcription®**”°. We speculate that condensate 
separation might be triggered when newly synthesized mRNAs transit 
P granules and interact with RNA-binding proteins to alter local protein 
concentration and initiate separation. In addition to temporal order- 
ing, we find that Z granules are spatially ordered relative to P granules 
and Mutator foci, with Z granules forming the centrepiece of PZM 
tri-condensate assemblages in adult germ cells. These results show that 
mechanism(s) exist to organize and arrange liquid-like condensates in 
space as well as time. Further work is needed to understand how PZM 
segments assemble in the correct order and to determine whether/ 
how PZM assembly contributes to RNA-based TEI. Small RNA-based 
pathways in animals are complex with many thousands of small regu- 
latory RNAs that regulate thousands of mRNAs at almost all levels of 
gene expression. We speculate that the ordering of liquid-like conden- 
sates in space and time helps to organize and coordinate these small 
RNA pathways, including RNA-directed TEI (Extended Data Fig. 10). 
Similar strategies may be used by cells to organize and coordinate other 
gene regulatory or biochemical pathways. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0132-0. 
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ARTICLE 


METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized, and investigators were not blinded to allocation during 
experiments and outcome assessment. 

Strain list. N2 (WT); (NL1870) mut-16(pk710); (YY009) eri-1(mg366), (YY193) 
eri-1(mg366); nrde-2(gg91), (YY502) nrde-2(gg91), (YY503) nrde-2(gg90), 
(YY538) hrde-1(tm1200), (YY562) hrde-1(tm1200); oma-1(zu405), (YY913) 
nrde-2(gg518[nrde-2::3xflag::ha]), (YY916) znfx-1(g9544[3xflag::gfp::znfx-1)), 
(YY947) hrde-1(tm1200); nrde-2(gg518), (YY967) pgl-1(gg547|pgl-1::3xflag::- 
tagrfp]), (YY968) znfx-1(gg544); pgl-1(gg547), (YY996) znfx-1(gg561), (TX20) 
oma-1(zu405), (YY998) znfx-1(gg544); ego-1(gg644[ha::tagrfp::ego-1]), (YY1020) 
znfx-1(gg561); oma-1(zu405), (SX461) mjIS31(pie-1::gfp::h2b), (SS579) pgl- 
1(bn101), (JH3225) meg-3(tm4259); meg-4(ax2026), (DG3226) deps-1(bn124), 
(YY 1006) eri-1(mg366); znfx-1(gg561), (YY 1003) eri-1(mg366); znfx-1(gk458570), 
(YY1021) znfx-1(g¢561); nrde-2(g¢518), (YY 1062) znfx-1 (gk458570), (YY1081) 
deps-1(bn124); mjIS31, (YY1083) wago-4(tm1019), (YY1084) wago-4(tm2401), 
(YY1093) wago-4(tm1019); mjIS31, (YY 1094) wago-4(tm2401); mjIS31, (YY1108) 
znfx-1(gg575); mjlS31, (YY1109) mjIS31; dpy-10(cn64), (YY1110) eri-1(mg366); 
wago-4(tm1019), (YY1111) eri-1(mg366); wago-4(tm2401), (YY1153) wago- 
4(tm1019), znfx-1(gg544),mjlS31, (YY1175) hrde-1(gg594[ha::hrde-1]) (YY1287) 
znfx-1(gg611[ha::znfx-1]), (YY1305) meg-3(tm4259); meg-4(ax2026), znfx- 
1(gg544), (YY1308) meg-3(tm4259); meg-4(ax2026), pgl-1(gg547), (YY1325) 
wago-4(g9620[3xflag::gfp::wago-4]), (YY1326) wago-4(gg620); znfx-1(gg561), 
(YY1327) pgl-1(gg547); wago-4(gg620), (YY 1346) pgl-1(gg547); znfx-1(gg561), 
(YY1364) meg-3(tm4259); meg-4(ax2026), wago-4(gg620), (YY1388) wago-4(g- 
g627|3xflag::wago-4]), (YY1393) znfx-1(gg611); wago-4(gg627), (YY 1408) znfx- 
1(gg575); mjlS31; dpy-10(cn64), (YY1416) znfx1(gg544); axIs1488, (YY1419) 
znfx-1(gg561); wago-4(tm1019); oma-1(zu405), (YY1442) znfx1(gg544); hjSi397, 
(YY1446) znfx-1(gg634[ha::tagrfp::znfx-1]), (cmp3) mut-16::gfp::flag+loxP, 
(YY1444) znfx-1(gg634); mut-16[mut-16::gfp::flag+loxP), (YY1452) znfx-1(gg544); 
Itls37(pie-1::mcherry::his58), (YY 1453) znfx-1(gg634); wago-4(gg620), (YY 1460) 
mut-16(mut-16::gfp::flag); znfx-1(gg561); (YY1461) mut-16(mut-16::gfp::flag); 
wago-4(tm1019); (YY1486) znfx1(gg631[3xflag::gfp::znfx-1 A helicase]), (YY1491) 
wago-4(gg620); oma-1(zu405) (YY1492)pel-1(g¢640[pel-1::3xflag::mcardinal]); 
mut-16{mut-16::gfp::flag+loxP]; znfx-1(gg634), (YY1494) wago-4(tm2401); pgl- 
1(gg547) (YY 1503) pgl-1(g¢547); mut-16[mut-16::gfp::flag+loxP], (YY 1556) wago- 
4(g9627); hrde-1(gg594). 

CRISPR-Cas9. gRNAs were chosen using Ape according to following standards: 
first, PAM sites are in the context of GGNGG*’ or GNGG; second, GC content of 
20-bp spacer sequence was 40% to 60%; third, high specificity according to http:// 
crispr.mit.edu. All CRISPR was done using co-CRISPR strategy””. Plasmids were 
purified with PureLink HiPure Plasmid Kits (Thermo Fisher). For deletions: two 
gRNAs (20ng,l~!) were co-injected into gonads with pDD162 (50ngyl~!), unc- 
58 gRNA (20 ng ul), AF-JA-76 (20 ng yl!) and 1x taq buffer. For 3x Flag or HA 
epitope tagging, single-strand oligonucleotides (4nM ultramer from IDT, purified 
by isopropanol precipitation) with 50-bp homology regions were used as repair 
templates. gRNA (20ngl~!) and repair template (20 ng jl!) were co-injected into 
gonads with pDD162 (50ngyl!), unc-58 gRNA (20 ngyl 1), AF-JA-76 (20ngyl!) 
and 1 x taq buffer. For GFP, TagRFP or mCardinal tagging, repair templates con- 
tained homologous arms of 500 bp to 1,000 bp and were cloned into pGEM-7zf(+). 
Sequences were confirmed by Sanger sequencing. Repair templates were amplified 
with PCR, gel purified and isopropanol precipitated. PCR product was heated at 
95°C for 5 min and then immediately put on ice for at least 2 min. Injection mix was 
prepared: pDD162 (50ng yl), unc-58 gRNA (20ng yl), AF-JA-76 (20 ng yl}, 
gRNAs close to N-terminal or C-terminal of the genes (20 ngi~!), heated and 
cooled repair template (50 ng’) and 1x standard taq buffer. Injected animals 
were maintained at 25°C, Unc animals were isolated 4 days later and grown at 
20°C. Animals were screened for deletion or tagging by PCR. 

RNA immunoprecipitation. Animals were flash frozen in liquid nitrogen and 
stored at —80°C. Animals were resuspended in sonication buffer (20 mM Tris- 
HCl pH 7.5, 200 mM NaCl, 2.5mM MgCh, 10% glycerol, 0.5% NP-40, 80 U ml! 
RNaseOUT, 1 mM dithiothreitol (DTT) and protease inhibitor cocktail without 
EDTA) and sonicated (30s on, 30s off, 20-30% output for 2 min on a Qsonica 
Q880R sonicator, repeat once). Lysates were clarified by centrifuging at 18,400g for 
15 min. Supernatants were precleared with protein A agarose beads and incubated 
with Flag-M2 agarose beads for 2-3h at 4°C. Beads were washed with RIP buffer 
(20mM Tris-HCl pH 7.5, 200 mM NaCl, 2.5mM MgCh, 10% glycerol, 0.5% NP-40) 
six times. Protein and associated RNAs were eluted with 100 1g ml! 3x Flag pep- 
tide. RNAs were treated with Turbo DNase I for 20 min at 37°C and then extracted 
with TRIzol reagent followed by precipitation with isopropanol. 

RT-qPCR. mRNA isolated from total RNA or from RNA immunoprecipita- 
tion experiments was converted to cDNA using the iScript cDNA synthesis kit 
according to vendor's instructions. The following primer sequence were used to 
quantify mRNA levels. oma-1 mRNA: 5'-GCTTGAAGATATTGCATTCAACC-3/ 


(forward primer); 5‘-AACTGTTGAAATGGAGGTGC-3’ (reverse 
primer). oma-1 pre-mRNA: 5‘-GTGCGTTGGCTAATTTCCTG-3’ (for- 
ward primer); 5‘-CTGAATCGCGCGAACTTG-3’ (reverse primer). 
gld-2 mRNA: 5‘-ACGTGTAGAAAGGGCTGCAC-3’ (forward primer); 
5'-GTCGATGCAGATGATGATGG-3’ (reverse primer). gld-2 pre- 
mRNA: 5’-CCTTATTAATTTCAGAGCTGCTGTC-3’ (forward primer); 
5'-AAGACTAGCACACGCAATCG-3’ (reverse primer). eft-3 pre- 
mRNA: 5’/-CCTGCAAGTTCAACGAGCTTA-3/ (forward primer); 
5’-TGAAAAACAAATTGGTACATAAAC-3’ (reverse primer). 

Mrt assay. Each generation, 3-6 L4 animals were picked to a single plate and 
grown at 25°C; average brood sizes were calculated by counting the total number 
of progeny per plate. 

RNAi inheritance assays. For dpy-11 and gfp RNAi inheritance, embryos were 
collected via hypochlorite treatment and placed onto HT115 bacteria expressing 
dsRNA against dpy-11 or gfp. F; embryos were collected by hypochlorite treatment 
from RNAi- or control-treated adults and placed onto non-RNAi plates. Worms 
were scored at late L4 (dpy-11) or early young adult (gfp) stages. 

For oma-1 RNAi inheritance, experiments were done at 20°C. Embryos were 
collected via hypochlorite treatment and placed onto HT115 bacteria express- 
ing dsRNA against oma-1. Six F, embryos were picked onto a single OP50 plate. 
From F; to Fe, six L4 animals were picked onto a single OP50 plate. tm1019 is a 
571-bp deletion that removes part of the PIWI domain. tm1019 also introduces 
a frameshift deletion that would be expected to prevent translation of the rest of 
the PIWI domain. znfx-1 (gg561) is an 8,476-bp deletion that deletes most (2,300 
out of 2,400 amino acids) of ZNFX-1, including the helicase domain. Both alleles 
were presumably null. 

Co-immunoprecipitation. Young adults were flash frozen in liquid nitrogen. 
Animals were ground into powder in liquid nitrogen and resuspended in 1 ml 1x 
lysis buffer (20 mM HEPES pH 7.5, 100mM NaCl, 5mM MgCh, 1mM EDTA, 10% 
glycerol, 0.25% Triton, 1 mM fresh-made PMSE, 1x complete protease inhibitor 
from Roche without EDTA) and rotated for 45 min at 4°C. Lysate was cleared by 
spinning at 2,300g for 15 min, 30 1] protein G beads were added to preclear lysate 
for 30 min. 3 x Flag:: WAGO-4 proteins were pulled down by 3011 agarose beads 
conjugated to anti-Flag antibody (A2220, Sigma-Aldrich). Input and immunopre- 
cipitation proteins were separated by SDS-PAGE and detected by Flag M2 antibody 
and HA antibody (Roche, 3F10). 

Small RNA sequencing. Total RNA was extracted using TRIzol. Total RNA (20)1g) 
was separated by 15% urea gel. Small RNA from about 18-35 nucleotides was 
cut from gel. Small RNAs were cloned using a 5’ monophosphate independent 
small RNA protocol as previously described”*. Libraries were multiplexed with a 
4-nucleotide 5’ barcode and a 6-nucleotide 3’ barcode and pooled for next-gener- 
ation sequencing on a NextSeq 500. FastX 0.0.13 was used to separate reads that 
contained the 3’ adaptor and filter low-quality reads for further analysis. Reads 
>14nucleotides were mapped to the C. elegans genome (WS220) using Bowtie. 
Read counts were normalized to the total number of reads matching the genome. 
Two independent libraries were prepared and the two replicates were combined 
for Fig. 1f. 

Microscopy and analysis. To image larval and adult stages, animals were immo- 
bilized in M9 with 0.1% sodium azide, and mounted on glass slides. To image 
embryos, gravid adults were dissected on a coverslip containing 101] of 1x egg 
buffer, and then mounted on freshly made 3% agarose pads. Animals were imaged 
immediately with a Nikon Eclipse Ti microscope equipped with a W1 Yokogawa 
Spinning disk with 50 um pinhole disk and an Andor Zyla 4.2 Plus sCMOS mon- 
ochrome camera. A 60 x/1.4 Plan Apo Oil objective was used unless otherwise 
stated. pie-1::gfp::h2b imaging was done using a widefield Zeiss Axio Observer.Z1 
microscope equipped with a Plan-Apochromat 63 x/1.40 Oil DIC M27 objective 
and an ORCA-Flash 4.0 CMOS Camera. 

Colocalization. The degree of colocalization between different fluorescently 
labelled proteins across development was calculated using the Coloc2 plugin from 
ImageJ. Animals were imaged as described above with the exception of using a 
100x/1.45 Plan Apo Oil objective. Around 3-5 granules were selected from at 
least 3 different animals across each stage of development specified. Region of 
interest (ROI) masks were generated using the 3D ROI Manager plugin in Image] 
to eliminate black regions surrounding granules”. Coloc2 was used to generate a 
Pearson's R value for degree of colocalization between two channels in the region 
defined by the ROI mask. 

FRAP. FRAP experiments were conducted using a Zeiss LSM 780 point scanning 
confocal equipped with a Quasar PMT x2 + GAaSP 32 Channel Spectral Detector 
using a 63 x/1.4 Plan Apo Oil objective. Adult animals (for pachytene germ cells) 
or embryos (for P, blastomere) were suspended in a mixture of 0.5% sodium azide 
and 50% 0.1 jm polystyrene beads (Polysciences) to inhibit movement. The mix- 
ture was added to a coverslip and placed on a fresh 3% agarose pad. Slides were 
sealed with nail polish. The bleaching plugin within the Zeiss Black software was 
used to specify the ROI to be bleached. One ROI was used for all data points. 
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Single z-slice images were acquired at 1 s intervals for 15s, followed by bleaching, 
then continued at 1s intervals for 85s. Images were aligned using neighbouring 
granules in ImageJ to account for subtle shifts in movement. An ROI was generated 
around the bleached region and continuously measured across all time points using 
the plot profile function within ImageJ. Data were normalized to an unbleached 
control granule to account for background bleaching throughout the 100-s period. 
Normalized data points were averaged across all seven granules and plotted using 
Prism. The heat map of a representative granule was generated using the thermal 
LUT within Image]. 

Quantification of distances between foci centres and surfaces. We imaged 
pachytene germ cell nuclei in three animals. Approximately ten granules 
were selected from each animal. Confocal z stacks were opened with the 3D 
objects counter plugin from Image] to generate x, y and z coordinates for the 
centre of each object*’. To account for chromatic shift between channels, 0.1 um 
Tetraspek beads were imaged and granule distances were corrected accordingly. 
Distances between foci surfaces was calculated with 3D ROI manager in ImageJ”. 
Thresholding function within 3D ROI manager was used to eliminate background 
signal. 

Immunofluorescence. Approximately 30 animals were sliced open in 8 1l of 1x 
egg buffer (25 mM HEPEs, pH 7.3, 118mM NaCl, 48 mM KCl, 2mM CaCh, 
2mM MgCl) to isolate gonads and embryos. A coverslip was added and slides 
were placed on a metal block (chilled on dry ice) for 10 min. Coverslips were 
popped off and slides were submerged in methanol at —20°C for 10 min, followed 
by acetone at —20°C for 5 min. Samples were allowed to dry at room temperature 
for 3 min. Then, 500 il of PBS with Tween 20 (1 x PBST) was added to each sam- 
ple and incubated for 5 min at room temperature followed by 500 l of 1x PBST 
and 1% bovine serum albumin (BSA) for 30 min at room temperature. Antibody 
solution (50 jl of 1x PBST, 1% BSA, 1:20 dilution of anti PGL-1 antibody (K76 
from DSHB), and 1:250 dilution of anti-HA antibody (abcam ab9110)) was added 
to each sample. Samples were covered with parafilm and incubated overnight at 
room temperature inside a humid chamber. Samples were washed three times in 
1x PBST at room temperature for 10 min. Secondary antibodies (Alexa Fluor 555 
goat anti-rabbit, Life Technologies A21429; and Alexa Fluor 488 goat anti-mouse, 
Life Technologies A10667) were diluted 1:50 in 1x PBST. Secondary solution 
(50,11) was added to each sample, covered with parafilm, and incubated for 90 min 
in the dark at room temperature. Samples were washed three times in 1x PBST at 
room temperature for 10 min. Vectashield antifade (15,1) and DAPI was added 
to each sample. Slides were sealed with nail polish. 
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Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Code availability. Custom python scripts used to generate small RNA plots in 
Fig. 1f are available upon request. 

Data availability. Small RNA sequencing data that support the findings of this 
study have been deposited in the Expression Omnibus (GEO) database with the 
accession code GSE112109. Source data for Fig. 2b, d is located in Supplementary 
File 1. The remaining data that support the findings of this study are available from 
the corresponding author upon reasonable request. 
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Extended Data Fig. 1 | See next page for caption. 
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Extended Data Fig. 1 | Genetic screen to identify novel RNAi 
inheritance mutants. a, A genetic screen was conducted to identify 
components of the C. elegans RNAi inheritance machinery. The screen 
contained several filters (see below) to remove known RNAi inheritance 
factor. Factors defective for RNAi inheritance are also defective for nuclear 
RNAi®. Therefore, our screen began with selections for mutant alleles 

that disrupt nuclear RNAi. Two selections were developed for nuclear 
RNAi mutants. First, the lin-15b and lin-15a genes are transcribed as a 
polycistronic message that is spliced within the nucleus into lin-15b and 
lin-15a mRNAs*'. Animals containing mutations in both lin-15b and lin- 
15a exhibit a multivulva (Muv) phenotype*”*?. RNAi targeting lin-15b 

(in eri-1(—) animals) silences lin-15b and lin- 15a co-transcriptionally, 
thus inducing a Muv phenotype*. The previously identified nuclear 
RNAi factors are required for lin-15b RNAi-induced co-transcriptional 
silencing of lin-15a and, therefore, for lin-15b RNAi-induced Muv. A 
second assay for nuclear RNAi is lir-1 RNAi. lir-1 RNAi is lethal because 
lir-1 is in an operon with lin-26, and co-transcriptional silencing of lin-26 
by lir-1 RNAi causes lethality**. Nuclear RNAi defective (NRDE) animals 
do not die in response to lir-1 RNAi because they fail to silence lin-26**. 
Previous genetic screens have used suppression of lir-1 RNAi to find 
factors required for nuclear RNAi. These screens have reached saturations: 
we have identified several alleles in all the nrde genes using this approach. 
Unpublished work from the laboratory shows, however, that hypomorphic 
alleles of the nrde genes will often block lin-15b RNAi-induced Muv and 
yet still die in response to lir-1 RNAi. We interpret these data to mean that 
survival from lir-1 RNAi is a much stronger selection for nuclear RNAi 
mutants than a failure to form Muv in response to lin-15b RNAi. That is, 
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factors that contribute to nuclear RNAi, but are not 100% required for 
nuclear RNAi, would not be identified by lir-1 RNAi suppression screens. 
Therefore, our screen looked for suppressors of lin-15b RNAi, which 

did not suppress lir-1 RNAi, because this screen might identify genes 
missed in our previous genetic screens. Step 1, identify factors required 
for nuclear RNAi. eri-1(mg366) animals were mutagenized with EMS. 

F, progeny were exposed to bacteria expressing lin-15b dsRNA. Non- 
Muv animals were kept as candidate novel nuclear RNAi factors. Step 2, 
discard known nuclear RNAi factors. We probably know all non-essential 
genes that can mutate to suppress lir-1 RNAi. Therefore, we discarded 
mutants that suppressed lir-1 RNAi as these alleles are probably known 
nuclear RNAi factors. Mutants that did not suppress lir-1 may contain 
mutations in factors important, but not essential, for nuclear RNAi. Step 
3, identify mutations that suppress RNAi inheritance. The last filter in 
our screen was to identify mutant alleles that disrupted RNAi inheritance. 
We subjected remaining mutant animals to dpy-11 RNAi, which causes 
animals to become Dumpy (Dpy). Progeny of animals exposed to dpy-11 
dsRNA inherit dpy-11 silencing and are Dpy**. RNAi inheritance mutants 
become Dpy in response to dpy-11 RNAi; however, the progeny of these 
animals fail to inherit dpy-11 silencing, and, therefore, are not Dpy. Thus, 
any of our mutant animals that became Dpy in response to dpy-11 RNAi, 
but whose progeny were not Dpy, were kept for further analysis. Finally, 
only one mutant was kept from each pool (pools were maintained as 
independent populations throughout the screen). b, c, Independent alleles 
of znfx-1 and wago-4 are (as expected) defective for lin-15b RNAi and 

not defective for lir-1 RNAi. Data are mean +s.d. of more than three 
biologically independent samples. 
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Extended Data Fig. 2 | ZNFX-1 is required for RNAi inheritance. 

a, Animals expressing a pie-1::gfp::h2b transgene were exposed to gfp 
dsRNA‘. The percentage of the Po, F; and F, progeny of the indicated 
genotypes expressing GFP was quantified. Data represent scoring of at 
least 80 animals in each generation and for each genotype. Note, the gfp 
reporter transgene used in this study is a multi-copy version of the single 
copy version used in Fig. 1b. Note that some RNAi inheritance can be 
seen in znfx-1 mutant animals using this reporter transgene. Thus, in 
some cases, some RNAi inheritance can occur in the absence of ZNFX-1. 
b, Animals of the indicated genotypes were exposed to dpy-11 dsRNA. 
The F, progeny of these animals were grown in the absence of dpy-11 
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dsRNA, and were scored for Dpy phenotypes. Data are mean + s.d. of 
more than three biologically independent samples. Consistent with the 
idea that ZNFX-1 (and NRDE-2) is required specifically for inheritance, 
znfx-1 mutant animals exposed directly to dpy-11 dsRNA are Dpy (data 
not shown). c, zu405ts is a temperature-sensitive (ts) lethal (embryonic 
arrest at 20°C) allele of oma-1°°. oma-1 RNAi suppresses oma-1(zu405ts) 
lethality, and this effect is heritable*®. Animals of the indicated genotypes 
were exposed to oma-1 dsRNA and the fertility of the progeny of these 
animals was scored over generations. Data show that znfx-1 mutant 
animals are defective for oma-1 RNAi inheritance. Data are mean + s.d. of 
three biologically independent samples. 
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Extended Data Fig. 3 | See next page for caption. 
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Extended Data Fig. 3 | CRISPR-Cas9-epitope tagged genes used in 

this study, with one exception, produce functional proteins and are 
expressed at or near wild-type levels. ae, The addition of epitope tags 
by CRISPR-Cas9-mediated gene conversion of znfx-1 or wago-4 did not 
affect function of tagged proteins in these RNAi inheritance. dpy-11 RNAi 
inheritance assays in which the progeny of animals exposed to dpy-11 
dsRNA are visually scored for the inheritance of Dpy phenotypes. The 
indicated epitope-tagged proteins are functional in this RNAi inheritance 
assay. n = 3 biologically independent samples; data are mean +s.d. f, g, pgl- 
1 mutant animals show a temperature-sensitive (25°C) sterile phenotype. 
The addition of epitope tags by CRISPR-Cas9-mediated gene conversion 
to the pgl-1 locus did not affect PGL-1 function as these animals were 
fertile. L4 animals were singled from 20°C to 25°C and brood sizes were 
scored. pgl-1::tagrfp (n= 6 animals) and pgl-1::mcardinal; tagrfp::znfx-1; 
mut-16::gfp (n= 15 animals). h, mut-16(—) animals are defective for pos-1 
RNAi. Embryos of the indicated genotype were grown on pos-1 dsRNA- 
expressing bacteria. Six L4 animals were picked to pos-1 dsRNA-expressing 
bacteria and laid eggs overnight. Unhatched embryos and hatched animals 
were scored. The addition of gfp to the mut-16 locus did not affect MUT- 
16 function. Data are mean + s.d. of three biologically independent 
samples. i-k, In some cases, Flag::GFP::WAGO-4-expressing animals are 
defective in RNAi inheritance, indicating that Flag::GFP::WAGO-4 is 
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not fully functional. i, Animals of the indicated genotypes were exposed 
to dpy-11 dsRNA and F, progeny were grown in the absence of dpy-11 
dsRNA. The percentage of Dpy animals is shown. At least 150 animals 

of each genotype were scored. Thus, 3 x Flag::GFP::WAGO-4 is not 
functional for dpy-11 inheritance. n = 3 biologically independent samples. 
j, 3x Flag::GFP::WAGO-4 is functional during oma-1 RNAi inheritance. 
See Extended Data Fig. 2c for details of the oma-1 RNAi inheritance 
assay. n = 3 biologically independent samples; data are mean + s.d. In 
Fig. 2d, both wago-4 and znfx-1 are shown to exhibit an Mrt phenotype 

at 25°C. Here, 3xflag::gfp::wago-4 animals are not Mrt, indicating that 

3 x Flag::GFP::WAGO-4 is capable of promoting germline immortality. 
n=3 biologically independent samples; data are mean + s.d. i-o, CRISPR 
tags did not seem to affect gene expression. To address the possibility that 
epitope tagging of the genes used in this study changed gene expression 
levels, we isolated total RNA from animals of the indicated genotypes 

and used qRT-PCR to quantify indicated mRNA levels. Primers target 
exon-intron junctions. Early stop or deletion alleles for each of these loci 
were used as controls. wago-4(tm1019) and znfx-1(gg561) are deletions 
and primers were located within deleted regions. pgl-1(bn101) and mut- 
16 (pk710) are nonsense alleles. A decrease in the mRNA levels of these 
mutants is probably due to nonsense-mediated decay. n = 3 biologically 
independent samples; data are mean + s.d. 
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Extended Data Fig. 4 | ZNFX-1 acts specifically during the inheriting 
phase of RNAi. a, znfx-1 is required in inheriting generations for RNAi 
inheritance to occur. In brief, we initiated gene silencing in znfx-1/+ 
heterozygous animals and scored the +/+ and —/— progeny for their 
ability to inherit gene silencing. Progeny containing at least one wild- 

type copy of znfx-1 were capable of inheriting gene silencing, whereas 
—/— progeny were not. More specifically, znfx-1(gg575) +/— animals that 
express the pie-1::gfp::h2b transgene’’ exposed to gfp dsRNA, and progeny 
from F, to F; generations were scored. Micrographs of GFP fluorescence 
in oocytes are shown. To identify cross progeny, the following strategy was 
used via CRISPR. pie-1::gfp::h2b was marked by dpy-10 (cn64) (dpy-10 

is approximately 0.77 cM from pie-1::gfp::h2b). dpy-10 (cn64)/+ animals 
are Dpy Rol and dpy-10 (cn64) homozygous animals are Dpy. znfx-1 
genotypes were inferred based upon wild-type, Dpy and Rol phenotypes; 
n> 30 animals. b, znfx-1 is sufficient in inheriting generations for RNAi 
inheritance to occur. We initiated gene silencing in znfx-1(—/—) animals, 
introduced a wild-type copy of znfx-1 to progeny (via mating), and scored 
znfx-1/+ cross-progeny for inheritance. The data show that znfx-1(+/—) 
progeny, from parents that lack a wild-type copy of znfx-1, were able to 
inherit silencing. znfx-1(gg575) was marked by dpy-10(cn64) (dpy-10 is 
approximately 1.09 cM from znfx-1). dpy-10(cn64)/+ animals are Dpy 

Rol and dpy-10(cn64) homozygous animals are Dpy. znfx-1 genotypes was 
inferred based upon wild-type, Dpy and Rol phenotypes. n > 20 animals. 
c, Additional biochemical evidence that ZNFX-1 acts in inheriting 
generations to promote inheritance. The nuclear RNAi factor NRDE-2 
binds to pre-mRNA of genes undergoing heritable silencing®. When znfx- 
1(—) animals were exposed directly to oma-1 dsRNA, NRDE-2 bound to 
the oma-1 pre-mRNA at wild-type levels. However, in progeny of znfx- 
1(—) mutant animals NRDE-2 failed to bind oma-1 pre-mRNA. Animals 
expressing NRDE-2::3 x Flag were treated with +-/ oma-1 RNAi. Extracts 
were generated from these animals as well as the progeny of these animals 
(which were not treated directly with oma-1 RNAi). NRDE-2::3 x Flag 

was immunoprecipitated with an anti-Flag antibody and NRDE-2 co- 
precipitating oma-1 pre-mRNA was quantified by qRT-PCR with exon- 
intron primer sets designed to detect unspliced RNAs (pre-mRNAs) of the 
oma-I1 gene as well as a control germline expressed pre-mRNA gld-2. hrde- 
J allele tm1200 and znfx-1 allele gg561 were used. Data are mean + s.d. of 
the ratio of signals + oma-1 RNAi; n=3 biological replicates. 
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Extended Data Fig. 5 | WAGO-4 is an Argonaute that localizes to the 
peri-nucleus and is required for RNAi inheritance. a, oma-1(zu405) is 
a temperature-sensitive lethal (embryonic arrest at 20°C) allele of oma-1. 


oma-1 RNAi suppresses oma-1(zu405) lethality and this effect is heritable®. 


Animals of the indicated genotypes were exposed to oma-1 dsRNA, and 
F, to F; progeny were grown in the absence of oma-1 dsRNA. Number 
of viable progeny of Po (directly exposed to oma-1 RNAi) and inheriting 
generations (F, to F¢, grown in the absence of oma-1 RNAi) were scored 
(20°C). Data are mean + s.d. of three biologically independent samples. 
b, Animals of the indicated genotypes and expressing a pie-1::gfp::h2b 


Adult germline 


transgene were exposed to gfp dsRNA”. Micrographs of animals +-/— gfp 
RNAi as well as the F, progeny of these animals are shown. The percentage 
of animals expressing GFP is indicated, and represent the scoring of at 
least 90 animals in each generation and for each genotype. c, We used 
CRISPR-Cas9 to append a gfp tag upstream of the predicted wago-4 atg 
start codon. Top, fluorescent micrographs of gfp::wago-4 in 2-cell, 4-cell 
and ~300-cell embryos. Bottom, fluorescent micrograph of the germline of 


an adult gfp::wago-4 animal. Images are representative of more than three 
animals at each lifestage. 
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Extended Data Fig. 6 | Visualization of Z granule formation with 
antibodies targeting PGL-1 (P granule) and HA::ZNFX-1 (Z granule). 
To control for possible artefacts caused by fluorescent epitopes, we 
conducted immunofluorescence on HA::ZNFX-1-expressing animals 
using anti-PGL-1 (K76 Developmental Studies Hybridoma Bank) and 
anti-HA (Abcam ab9110) antibodies. a, Anti-PGL-1 and anti-HA signals 
colocalized in the P2 blastomeres of 4-cell embryos. b, Anti-PGL-1 


Pachytene 


and anti-HA signals were adjacent, yet distinct in, in pachytene germ 
cells. No PGL-1 or HA::ZNFX-1 signal was detected in pgl-1(bn101) 
animals, which do not express PGL-1 or HA::ZNFX-1, establishing that 
immunofluorescent signals were specific. c, Magnification of foci from a 
and b. Images in a-c are representative of three independent animals at 
each life stage. Scale bars, 1 1m (a), 11m (b) and 0.5 1m (c). 
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Extended Data Fig. 7 | See next page for caption. 
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Extended Data Fig. 7 | Z granules independently form liquid-like 
condensates that do not colocalize with P bodies but localize adjacent 
to EGO-1 foci, and can be physically and temporally dissociated from 
P granules. a, During oocyte maturation, ZNFX-1 foci detach from 

the nuclei, assume spherical shapes and move away from the nucleus; 
this behaviour is consistent with Z foci being liquid-like condensates. 

In addition, the data show that Z foci can exist at developmental stages 
during which P granules are no longer visible, indicating that Z foci can 
be temporally separated from P granules. Image is of maturing oocytes 
of animals expressing the indicated fluorescent proteins. Long arrows 
indicate oocytes that contain Z granules, but not P granules. Image 
representative of more than three animals. b, Z foci exhibit properties 
reminiscent of liquid droplets. Left, GFP::ZNFX-1-expressing animals 
were subjected to FRAP (see Methods) and fluorescence was monitored in 
bleached area over indicated time. Data are normalized to a non-bleached 
control granule from the same sample. Data are mean +s.e.m. of n=7 
individual granules from 7 animals. Right, heat maps showing recovery 
of ZNFX-1 fluorescence in a representative bleached Z granule. c, Z foci 
do not colocalize with other known liquid droplets. GFP::ZNFX-1 does 
not colocalize with markers of processing bodies. PATR-1 and DCAP-1 
localize to processing bodies!>. Fluorescent micrographs of somatic 
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blastomeres of embryos expressing the indicated fluorescent proteins. 
ZNFX-1 does not colocalize with markers of processing bodies in these 
cells. Images are representative of more than three independent animals. 
d, Z foci in adult pachytene germ cells do not colocalize with EGO-1. 
ZNFX-1 foci form adjacent to EGO-1 foci. Fluorescent micrographs of a 
single pachytene germ cell nucleus from animals expressing GFP::ZNFX-1 
and TagRFP::EGO-1. A 3D render of a representative foci is shown below. 
Images are representative of three independent animals. Scale bars, 
0.5m. e, Z foci can be physically separated from P granules. Gonads 
were isolated from animals expressing GFP::ZNFX-1 and PGL-1::TagRFP 
and subjected to shearing force as described®. Time-lapse imaging at 10-s 
intervals is shown. A PGL-1-labelled P granule detaching from the nucleus 
and flowing throughout the cytoplasm is shown (large arrow). ZNFX- 
1-labelled Z granules remain immobile (small arrow). Physical shearing 
was induced as previously described’. In brief, GFP::ZNFX-1 and PGL- 
1::TagRFP adults were dissected to extrude gonads. Isolated gonads were 
squeezed between two coverslips to generate shearing force. Coverslips 
were then mounted on a slide and imaged immediately with a spinning 
disc confocal. Z stacks were acquired every 10 s. Images are representative 
of four independent animals. 
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between centres (a) and surfaces (b) of the spaces occupied by corrected for chromatic shift. 
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Extended Data Fig. 9 | P granule assembly factors contribute to RNAi 
inheritance, normal Z granule morphology and the ability of ZNFX-1 
to bind mRNAs. a, DEPS-1 is required for P granule formation in adult 
germ cells'??”8, deps-1(bn124)/+ animals expressing the pie-1::gfp::h2b 
transgene’” were exposed to gfp dsRNA. Progeny were grown in the 
absence of gfp dsRNA for three generations. Fluorescent micrographs 
show GFP expression in oocytes. The percentage of animals expressing 
pie-I::gfp::h2b is shown. These data show that DEPS-1 activity is required 
in inheriting generations to allow for gfp RNAi inheritance. n = 32 animals 
for Po, n > 100 for F; to F3. b, MEG-3/4, DEPS-1 and PGL-1 also contribute 
to P granule formation'!*”*, dpy-11 RNAi causes animals exposed to 
dpy-11 dsRNA to become Dumpy (Dpy). Progeny of animals exposed to 
dpy-11 dsRNA inherit dpy-11 silencing and are Dpy*®. RNAi inheritance 
mutants become Dpy in response to dpy-11 RNAi; however, progeny fail to 
inherit dpy-11 silencing, and, therefore, are not Dpy. Animals of indicated 
genotypes were exposed to dpy-11 dsRNA. F; progeny were grown in the 
absence of dpy-11 dsRNA. (—) indicates non-Dpy; (+) indicates mild 

Dpy phenotype; (++) indicates strong Dpy. pgl-1, deps-1, and meg-3/4 

are defective for dpy-11 RNA inheritance. n = 3 biologically independent 
samples for each condition. c, Animals of indicated genotypes were 
exposed to dpy-11 dsRNA and F, progeny were grown in the absence of 
dpy-11 dsRNA. Body lengths of F, animals were measured by Image J. 
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Data are expressed as body length from progeny of dpy-11 RNAi-treated 
animals divided by the average body length from control animals. The 
mean of n > 12 animals with P values calculated by Student's two-tailed 
t-test is shown. d, In deps-1(bn124) animals, most Z granules are smaller 
than normal while one Z granule/nucleus becomes enlarged. Images are 
from pachytene region of germline. Images are representative of more 
than three animals. e, In deps-1(bn124) animals, ZNFX-1 does not bind 
RNA. Wild-type or 3 x Flag::ZNFX-1-expressing animals were treated with 
oma-1 dsRNA. ZNFX-1 was immunoprecipitated in RNAi generation with 
anti-Flag antibodies and co-precipitating RNA was subjected to (RT-PCR 
to quantify oma-1 mRNA co-precipitating with ZNFX-1 in wild-type or 
deps-1(bn124) animals. gld-2 is a germline-expressed control mRNA. Data 
are mean + s.d. of three biologically independent samples. f-i, Loss of 
ZNFX-1 or WAGO-4 does not seem to affect the formation of Z granules 
marked by GFP::WAGO-4 or GFP::ZNFX-1 (f), Mutator foci marked by 
MUT-16::GFP (g), or P granules marked by PGL-1::RFP (h, i). Note, in 
late embryonic germline development, PGL-1::TagRFP foci may not be 
efficiently concentrated into Z2/Z3 in wago-4 mutant (data not shown). 
Images are representative of more than three animals. All images in d-i 
were taken using a 60x objective, and scaled to the same size as the other 
images within a panel. Scale bars, 51m (f, i). 
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A small amount of mini-charged dark matter could 
cool the baryons in the early Universe 


Julian B. Mufioz!* & Abraham Loeb? 


The dynamics of our Universe is strongly influenced by pervasive— 
albeit elusive—dark matter, with a total mass about five times the 
mass of all the baryons’. Despite this, its origin and composition 
remain a mystery. All evidence for dark matter relies on its 
gravitational pull on baryons, and thus such evidence does not 
require any non-gravitational coupling between baryons and 
dark matter. Nonetheless, some small coupling would explain the 
comparable cosmic abundances of dark matter and baryons’, as 
well as solving structure-formation puzzles in the pure cold-dark- 
matter models’. A vast array of observations has been unable to 
find conclusive evidence for any non-gravitational interactions 
of baryons with dark matter*-*. Recent observations by the 
EDGES collaboration, however, suggest that during the cosmic 
dawn, roughly 200 million years after the Big Bang, the baryonic 
temperature was half of its expected value’. This observation is 
difficult to reconcile with the standard cosmological model but 
could be explained if baryons are cooled down by interactions with 
dark matter, as expected if their interaction rate grows steeply at 
low velocities!’. Here we report that if a small fraction—less than 
one per cent—of the dark matter has a mini-charge, a million times 
smaller than the charge on the electron, and a mass in the range 
of 1-100 times the electron mass, then the data!° from the EDGES 
experiment can be explained while remaining consistent with all 
other observations. We also show that the entirety of the dark matter 
cannot have a mini-charge. 

A new arena for the search for interactions between dark matter (DM) 
and baryons can be found at the cosmic dawn. During this era, the first 
stars were formed”, and their ultraviolet emission coupled the spin tem- 
perature of neutral hydrogen to the much lower kinetic temperature!?"4, 
causing cosmic-microwave-background (CMB) photons with a local 
wavelength of 21 cm to be resonantly absorbed by the intervening neu- 
tral hydrogen. Eventually, the baryonic gas was heated by X-ray sources, 
and the hydrogen spin temperature increased above that of the CMB, 
triggering 21-cm emission’. The absorption era, however, provides one 
of the lowest-velocity environments in our Universe: DM interactions 
with the visible sector mediated by a massless field, such as the photon, 
are expected to be at their most prominent at that time. We will use the 
idea of the low-velocity environment to explore the possibility that the 
DM interacts with baryons through a small electric charge. 

Endowing the DM with a ‘mini-charge’ has unique phenomenological 
consequences, as charged particles respond to background magnetic fields, 
which are rather common in astrophysical environments. It was argued 
that supernova shocks would eject all mini-charged particles from the 
Galactic Disk!®, and that the Galactic magnetic field, known!’ to extend 
beyond Galactic heights of 3 kpc, would prevent them from re-entering 
the disk. Given that the DM density within 1.5 kpc of the disk is in agree- 
ment with predictions!®, we conclude that not all DM can be evacuated 
from the disk, and thus, ant charged particles with charges larger than 
e/m, 25x 107'° MeV ~ are precluded from being the entirety of the 
cosmological DM. Throughout this paper, we define the DM mini-charge 
ein units of the electron charge e, soc = e, /e, where ey is the DM charge. 
A comparable constraint can be obtained from studying galaxy clusters, 


which we detail in the Methods section. We note that, even though the 
precise numerical value of these limits on ¢/m,, can be altered by different 
assumptions, the charge required for DM to cool the baryons would be 
orders of magnitude larger. 

These constraints would not apply if only a fraction of the DM is 
charged, as most of the DM would behave as expected. Given that the 
local DM measurements are accurate to within tens of per cent, we 
will focus on the possibility that the mini-charged particles consti- 
tute a small fraction fam < 0.1 of the DM, while the rest of it is neu- 
tral. This can be naturally achieved if DM forms ‘dark atoms; with a 
small charged-DM fraction remaining free after its recombination”, 
although we will posit no assumptions about the origin of the mini- 
charged particles. The momentum-transfer cross-section between a 
mini-charged particle and a target t (electron or proton) is”” 


7 Ane*hareé 
a a (1) 
HY 


where fy,t 1s the reduced mass of the target and DM, v is the relative 
velocity between the two particles, a is the fine-structure constant, 
c the speed of light, f the reduced Planck constant and € the Debye 
logarithm’, which we compute in the Methods section. The velocity 
behaviour of this cross-section is that of Rutherford scattering, grow- 
ing as the DM-baryon fluid becomes slower and, by extension, colder. 
The relative velocity between the DM and baryons is not determined 
by their thermal motion alone. The gravitational infall of baryons is 
impeded until hydrogen recombination occurs, whereas the DM 
streams freely, causing a velocity difference between them?! 
Interactions between DM and baryons cause a drag on this velocity”, 
which eventually leads to mechanical equilibrium of the two fluids, 
dissipating the velocity into thermal energy. Additionally, interactions 
between DM and baryons will tend to bring the two fluids into thermal 
equilibrium, equating their temperatures. Given that the DM is very 
cold, this can severely lower the baryon temperature. To reduce the 
baryonic temperature substantially (as is required to explain the 
EDGES data!”) with DM-baryon interactions, equipartition demands 
the existence of at least as many mini-charged particles as baryons, 
which translates into a DM mass m, < i, F smn (12-/2,)s Where i, is the 


mean molecular weight of baryons, and Q, and Q) are the cosmic abun- 
dances of DM and baryons, respectively. Thus, for each value of fam we 
will study only the mass range m, <6.2 GeV ~ fam- 

We solve for the thermodynamical evolution of the DM and bary- 
onic fluids simultaneously, accounting for their relative velocity, as 
well as the small free-electron fraction left after recombination”? 
(see Methods section for details). This yields the baryonic temperature 
T,(z, vi ») as a function of the initial DM-baryon relative velocity, v, @. 
To qerove dependencies on the astrophysics of the coupling beiween 
the spin and kinetic temperatures, we define the average baryonic 
temperature as 


(T,(z)) = favs dv, P(v®) 7, (z,v,) (2) 
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where the probability distribution function of the initial velocity, P(v), 
is given by a Maxwell-Boltzmann distribution with root mean square 
(rm.s.) velocity V:ms=29 km s~! at decoupling”!. 

Figure 1 shows, for different values of fam, the lines in the e—my plane 
that would produce enough baryonic cooling to explain the EDGES meas- 
urement’, Given fam, the required mini-charge scales as € ox my. There is, 
however, no simple analytic solution for the slope of this line as a function 
of fam» since for small energy transfers we expect the baryon heating to 


be Q, xf ae e’/m oO whereas for large energy transfers (and assuming 
jin 02), OO, ef? te ra /m.. We have empirically found that 


d 
a =“ 3) 
MeV 


10°? 
is sufficient to reduce the baryonic temperature by a factor of 2, 
although we emphasize that the 21-cm results in Fig. 1 have been cal- 
culated numerically for each value of fam. 

Next, we summarize the relevant constraints on mini-charged DM. 
Minicharged-particle production during the supernova 1987A would 
have altered its neutrino luminosity”*”, thus constraining the range 
10-7 <e< 107°, which we label SN1987A in Fig. 1. A search for mini- 
charged particles at SLAC National Accelerator Laboratory”® placed 
constraints on mini-charges larger than € ~ 10~* for my < 100 MeV. 
We show this constraint in Fig. 1 labelled as SLAC mQ. Measurements 
of the matter power spectrum, from the CMB and the Lyman-a forest, 
can only constrain mini-charged particles if they compose a major part 
of the DM’. Otherwise, even particles with mini-charges 
€>10-%(m oF MeV)? which would be in thermal contact with baryons 
at the CMB epoch, are allowed to compose up to 1% of the DM?’. This 
constraint, nonetheless, closes the apparent gap for m, = 200 MeV in 
Fig. 1, as, above this threshold, the 21-cm data would require more than 
1% of the DM to have a charge.Thus, we will focus on the f,_ 10~* 
range for the rest of this paper. 

The cosmology of mini-charged particles can place additional con- 
straints on their charge. Particles with mini-charges larger than 
«> 10-%(m uf MeV)'/2, which encompasses the region of interest, would 
reach equilibrium with the visible sector in the early Universe. This 
places limits on mini-charged particles lighter than electrons, since they 
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Fig. 1 | Regions of the mini-charged-particle parameter space explored 
by 21-cm observations, and current constraints. Each thick solid line 
(from black to red) represents the mini-charge required to reduce the 
baryonic temperature by a factor of two, as reported by EDGES”, if a 
fraction fam of the DM is mini-charged. We require my < 6.2 GeV x fam 
to produce enough cooling (with lines ending abruptly at that mass), 

and fam < 1071, as larger values are ruled out by other observations. The 
coloured hatched regions are excluded by different datasets, and the 
green long-dashed line represents the mini-charge required to obtain the 
appropriate DM abundance. 
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Fig. 2 | Brightness temperature of 21-cm emission as a function of 
redshift. We assume full Lyman-a coupling and no X-ray heating, 

both in the case with (solid line) and without (dotted line) DM-baryon 
interactions. Negative temperatures indicate absorption. The red data 
point represents the data from EDGES”, of T?? = — 50012 mK, at 30 
(99.73% confidence levels shown by error bar). We also show (dashed 
blue line) the r.m.s. of the 21-cm temperature due to velocity fluctuations, 
multiplied by a factor of —50. 


would appear as additional light degrees of freedom (parametrized as 
a change in the effective number Nef of neutrino species) during Big 
Bang nucleosynthesis (BBN)”*. We label the constrained region in Fig. 1 
as BBN. Moreover, if a dark photon is the origin of the mini-charge, it 
can also alter BBN and the CMB through the same mechanism’. We 
estimate this constraint in the Methods section, and label it as Nor in 
Fig. 1. In the standard freeze-out scenario, the DM production is halted 
when the baryonic temperature drops below its mass, and its annihila- 
tion rate determines the relic abundance left in the dark sector. We 
compute the mini-charge required to produce the appropriate DM 
abundance and plot it as the dashed line in Fig. 1. It is clear that— 
barring a small region for m, of a few MeV, ande > 10 °—most of the 
parameter space that we are considering is below this thermal-relic line, 
thus requiring new interactions to allow the DM to annihilate effi- 
ciently. We leave this challenge for future model building of the needed 
dark sector. 

Let us now study how a change in baryonic temperature translates 
into an observable 21-cm brightness temperature. The 21-cm temper- 
ature is inversely proportional to the gas temperature during the cosmic 
dawn, and shown in Fig. 2 for a specific choice of e/m, and fam. The 
EDGES data!” is in tension with the maximum absorption possible in 
the standard model, whereas this tension is resolved when introducing 
mini-charged DM particles. Thus, we conclude that a subpercent frac- 
tion of the DM with an electric charge e, ~ 10~%e and mass about 
1-60 MeV can cool the baryons considerably, while being consistent 
with all current constraints. This scenario predicts new inhomogenei- 
ties in the baryon temperature, since the DM-baryon relative velocity, 
with fluctuations over Mpc scales?!, modulates the overall cooling/ 
heating, thus forming a source of additional 21-cm fluctuations”. We 
can estimate the size of these fluctuations by finding the root-mean- 
squared 21-cm brightness temperature, as a function of the DM- 
baryon velocity. Figure 2 shows that the same interactions that cause 
baryonic cooling also lead to additional 21-cm fluctuations at the level 
of a few per cent. These are comparable to the Mpc-scale adiabatic 
fluctuations at z~ 17, and are potentially detectable with upcoming 
21-cm interferometers, such as HERA. Their detection would con- 
firm that DM and baryons were in thermal contact during the cosmic 
dawn, and would thus constitute an indication of DM physics beyond 
the standard model. 


Online content 
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METHODS 


Cluster constraints. Additional constraints on mini-charged particles can be 
achieved by requiring that the DM must not be trapped in regions of coherent 
magnetic field within galaxy clusters*!, which have a typical correlation lengths 
Tcorr Of the order of 10 kpc and field strengths B~5 1G. This means that charged 
particles with charges larger thane/m, = 3 x 10 '7 MeV ' would not be distrib- 
uted as cold DM but instead clump wherever magnetic fields are coherent. 
Additional constraints can be derived, through plasma effects in cluster collisions, 
such as the bullet cluster*’, as well as by requiring the mini-charged particles not 
to diffuse within clusters**, although simulations would be required to isolate these 
effects from nonlinear gravity. 

Debye logarithm. Here, and throughout the Methods section, we work in natural 
units, where fi=c=1, and also set the Boltzmann constant to unity. In the main 
text we defined the Debye logarithm, € which regulates the forward divergence of 
the momentum-transfer integral’. This factor is roughly constant during the era 
of interest, so we will set it to 
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where we adopt a fiducial baryonic temperature T, = 10 K, and evaluate the 
free-electron fraction x, and the number density ny of hydrogen nuclei at redshift 
z=20. 

Drag and heating terms. DM-baryon interactions cause a drag D(v,,») = dvy,,/dt, 
on their relative velocity”, which we recast as 
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where /p is the baryon energy density. The number density of mini-charged DM 
is given by n, =fampa/m,, where pg is the (total) DM energy density at redshift z, 
and m, is the target mass. Here we have defined the function 


ata) =ert 4] |? ner (6) 


where r; = V\,b/Uth,, and the thermal sound speed of the DM-target fluid is given by 
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Up = [2 +— (7) 
ym my 


where T, is the mini-charged-DM temperature. By comparing this sound speed 
with the relative velocity, we see that in the standard case of T, =0, immediately 
after recombination (and before the X-ray heating of the baryons), the baryonic 
sound speed falls below the DM-baryon relative velocity, making the DM-proton 
fluid (albeit not the DM-electron fluid) ‘supersonic. 

In addition to damping the DM-baryon relative velocity, these interactions give 
rise to a baryonic heating”, 


Q,=n Xe mm, G, 
“1 thie. t=e,p (my + m,)° Unt 
d 
Ze? F 
‘ [5 z (L—T) + my a (8) 
TK Ut, N 


where fite = NyHe/Ny & 0.08. Here, we have included DM interactions with both 
protons and electrons, as the latter can dominate if the DM fluid is not cold. The 
DM heating can be found by symmetry, through the transformations n, — ny, 
my, < m,and T, < T). This heating can be positive or negative depending on 7, 
as for r, + 0 (ccrréaponding tow ,< uy,,,) only the fempeta ite -dependent term 
survives, corresponding to the cual thermalization*; whereas for 7, >> 1 (which 
implies vy, , >> up, 1)» the heating term proportional to Fr) dominates, converting 
the mechanical energy of the relative velocity into heat for both fluids. 
Dark-matter sound speed. For illustration purposes, we note that transfering 
half of the baryonic thermal energy to the DM at z= Zcentra) would induce a DM 
sound-speed of 


2 T_T 2 _ (.1kms *)? 
we py BOE (9) 
my Hy Fam c Fam 
at Z= Zcentral & 17, where the EDGES data lie!. Interestingly, for fam S 0.2, this 
gives the mini-charged component of the DM a sound speed larger than that of 
protons, thus setting the value of uin,p in equation (7). Moreover, we estimate that 
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for f,,S 10” the DM-electron interactions will dominate over the DM-proton 
ones, owing to this velocity. This would, however, suppress the matter power spec- 
trum only on extremely small scales and for a minority of the DM. 
Thermal evolution. With the heat and drag terms from equations (5) and (8), we 
can calculate the thermal evolution of the DM and baryon fluids as 


1, = —2HT, + 2Q,/3 + Ic(T,—T,) (10a) 
T= —2HT, + 2Q,/3 (10b) 

x= —C [ny Agx2—4(1—x,) B yee! 4} (10c) 
—Hy, ,—D(y,5) (10d) 


where H is the Hubble parameter at time t, C is the recombination factor*>-*’, Ey 
is the ground-level energy of hydrogen, and A, and By are the effective case B 
recombination/reionization coefficients*’. We have ignored photoheating and 
recombination cooling, as well as possible baryonic heating due to DM annihila- 
tions, if these were present®*. Here T., is the temperature of the CMB photons, and 
the Compton thermalization rate is 


en 80,a,T 1X, 
C== ee (11) 
3(1 + frye) Mee 


where oy is the Thomson cross-section, and a, is the Stefan—Boltzmann constant®™”?. 
Stellar production. If the mini-charged particles were lighter than about 100 keV, 
they could be produced in stars, such as white dwarfs and red giants, cooling these 
objects too rapidly”. This tightly constrains their charge, although given that BBN 
already rules out masses above that limit we do not show these constraints in Fig. 1. 
Neg constraint. We can estimate for what value of the mini-charge dark photons 
would be produced, by requiring that the timescale for two mini-charged par- 
ticles to annihilate into dark photons is longer than the Hubble time. For mini- 
charged particles in thermal equilibrium with standard-model particles in the early 
Universe, their rate of annihilation into dark photons is” 


Pen r= nove 10° 39/4 T. 


XX17 (12) 


where g’ is the coupling constant between x and 7. By requiring this rate to be 
smaller than H = T- a 'M,,, where Mp is the reduced Planck mass, we can obtain a 
constraint on g’, so that DM does not annihilate to dark photons before T,= my. 
Because the DM mini-charge is the product of the dark-photon mixing k and the 
dark coupling g’, and we require « < 1, this translates into a constraint 


mM, 
e<S2x 05 (13) 


1/4 
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for my < Agcp ¥ 200 MeV, where Agcp is the quantum chromodynamics scale, 
which we label as Nef in Fig. 1. Annihilations of y particles into yy’, or Compton- 
like processes (yy + 7’) would be suppressed by a x” factor, and at most change 
the constraint by a O(1) factor. Notice that, even for « ~ 1, if the dark photon is 
massless we can define the photon to be the linear combination of bosons that 
couples to our sector, in which case only mini-charged particles would interact 
with baryons. Here we have assumed that y is a spin-1/2 particle, and we note that 
this constraint can, of course, be tightened if« < 1, and can extend to DM masses 
as high as 1 GeV (ref.2°). We note, however, that even though dark photons are a 
possible way to obtain mini-charged particles*’, they might not be necessary", 
which would render these constraints invalid. 

Thermal-relic mini-charge. To compute the mini-charge required to produce the 
right DM abundance, we use the approximate formula Q,h? ~ 0.1(x1/10) [10-6 cm? 
s'/(ov)], where x;= m,/Ty, Tris the freeze-out temperature, and for mini-charged 
DM the annihilation cross-section to fermions is”® 


(ov) = x, 1 (14) 
m \ 


x 


2 
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x 
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We will ignore any dark-sector interactions and for simplicity consider only annihi- 
lation into electron-positron pairs. To obtain a simple estimate, we further approx- 
imate x; to be a constant, as it only depends logarithmically on the DM mass and 
charge, and find the region of the e—my, plane that produces the right DM relic 
abundance. In the thermal-relic calculation we have assumed fam = 1, although of 
course to obtain a small fraction of DM with mini-charges, the rest of it would have 
to form dark atoms, or otherwise have a larger mini-charge. 
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Cosmology of mini-charged particles. Only a sliver of parameter space is com- 
patible with the right relic cross-section, so one can wonder how mini-charged 
particles were produced. If they were heavier than about 200 MeV they could 
annihilate to light dark-sector particles, while not leaving a trace in current CMB 
probes”. However, for DM lighter than about 200 MeV, the standard-model plasma 
has been heated by the QCD phase transition, and any populated light degrees of 
freedom in the dark sector would alter the CMB anisotropies, which is strongly 
disfavoured by Planck data!. In that case, other mechanisms would have to be 
invoked to set the right DM abundance’. 

Direct-detection constraints. Interestingly, mini-charged particles can remain in 
the Galactic disk if they cool efficiently, which for DM-electron cooling requires'© 
€>1075(m,/MeV)!”, or otherwise can be achieved through dark-sector interac- 
tions. Moreover, we estimate the Galactic magnetic-field energy density to be at 
least three orders of magnitude smaller than the DM kinetic energy density in the 
solar vicitinity. Thus, DM could be able to breach through magnetic field lines and 
re-enter the disk (albeit altering the magnetic field structure of our Galaxy). We 
note, however, that this might not lead to direct-detection signals in dark-matter 
detectors, as the gyroradius of one of these particles would be rz + 10 km 
(m,/MeV)(€/10~°)~! on the terrestrial magnetic field of Bs + 0.1 G. This would 
imply, though, that these particles can produce atmospheric ionizations, acting as a 
nox borealis, similar to the regular aurora borealis produced by solar-wind particles. 
We point out, however, that Earth-based experiments could be sensitive to even a 
minuscule trace of mini-charged particles, as these can interact rather strongly. 
Given that neither disk ejection, owing to the complex astrophysics of the inter- 
stellar medium, nor the terrestrial magnetic field would have perfect efficiency in 
shielding the Earth from these particles, there might be hope for direct detection, 
especially through space-based experiments. An example is the limits from the 
X-ray calorimeter**, which, however, do not constrain these particles if their masses 
are below about 100 MeV. Additionally, we estimate that torsion-balance experi- 
ments*°, which can constrain accelerations as small as 107! cm s~?, could be 
sensitive to mini-charges of the order of 10-°(m,/MeV), ifthe DM density on the 
Earth’s surface was 1% of its usual value. This result is comparable to that required 
for baryonic cooling during cosmic dawn, although the specific number is con- 
trolled by the fraction of the DM that diffuses to the solar vicinity. More impor- 
tantly, the cross-section of these mini-charged particles would be similar to the 
atmospheric column density, so any constraints obtained on the surface would 
depend strongly on the DM momentum loss during atmospheric entry. 

The 21-cm temperature. The brightness temperature of the 21-cm line can be 
written as*° 


1/2 
s 


T, 


T°) =27mK (15) 


oa 


Xp 2, h | | 0.15 1+z 
0.023 }{Q,h7 10 
where x} is the neutral-hydrogen fraction, h is the reduced Hubble constant, 
Qm =, + Qp, and T, is the spin temperature of the hydrogen gas. We use the solu- 
tions for T;, of equations (10a—d), assuming full Lyman-c coupling (so T;= T;)*”, 

to obtain the sky-averaged 21-cm temperature**” 

joe Ua = fay sP oT [Ty] (16) 
Interactions with the neutral medium. As a check, we have estimated the inter- 
actions of mini-charged particles with the neutral baryonic medium through 
Linhard’s formula”, and found that they are always subdominant, by at least 
four orders of magnitude. 
Other models. In this work we have assumed that mini-charged particles interact 
through a massless dark photon. However, our results apply to any dark photon 
lighter than the typical momentum transfer, which is between about 1 eV and 
1 keV for DM masses between 1 MeV and 1 GeV. Additionally, we can easily 
translate our results for a DM mini-charge to a new hadrophilic or leptophilic 
DM-baryon interaction mediated by a light scalar . For fam= 1, we found that a 
DM mini-charge of e = 10-8 x (m y/MeV) is sufficient to decrease the baryonic 


temperature by a factor of 2 (although we remind the reader that this case is ruled 
out for mini-charges). Even ignoring DM self interactions*!, and setting the 
@-DM coupling g, to unity, the ¢-nucleon coupling required is 
gy = Bnae(%,) 1/7 22x 101 x (m, /MeV) , where X, = 2 x 10 * during the 
era of interest. Similarly, the d-electron coupling required would be 
g,= 8nae(Xm,/m,) |! ~10°x (m,/MeV) . For DM in the MeV to GeV 
range (and thus mediators with my < keV), these couplings are constrained by 
stellar cooling, whereas for lighter DM the mediator would give rise to an anom- 
alous fifth force>??, 

Dark-matter self-thermalization. In this work, we have conservatively assumed 
that the small fraction of DM that is charged does not thermalize with the rest of 
the DM. If it did, one could simply rescale our results for e from the fam = 1 case 
by (fam) _ gee 

Code availability. The code used to generate the thermodynamical evolution in 
equations (10a—d) is available upon request. 

Data availability. The datum of 21-cm temperature at Zcentrai used in Fig. 2 was 
obtained from ref.!°. 
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An increase in the ?C + °C fusion rate from 
resonances at astrophysical energies 


A. Tumino!*, C. Spitaleri??, M. La Cognata’, S. Cherubini?’, G. L. Guardo**, M. Gulino!, S. Hayakawa?”, I. Indelicato?, 
L. Lamia’, H. Petrascu‘, R. G. Pizzone’, S. M. R. Puglia’, G. G. Rapisarda?, S. Romano”, M. L. Sergi’, R. Sparta? & L. Trache* 


Carbon burning powers scenarios that influence the fate of stars, 
such as the late evolutionary stages of massive stars’ (exceeding 
eight solar masses) and superbursts from accreting neutron stars”. 
It proceeds through the !7C + !*C fusion reactions that produce an 
alpha particle and neon-20 or a proton and sodium-23—that is, 
22C(2C, «)?°Ne and 7C(!?C, p)?7Na—at temperatures greater than 
0.4 x 10° kelvin, corresponding to astrophysical energies exceeding a 
megaelectronvolt, at which such nuclear reactions are more likely to 
occur in stars. The cross-sections‘ for those carbon fusion reactions 
(probabilities that are required to calculate the rate of the reactions) 
have hitherto not been measured at the Gamow peaks‘ below 2 
megaelectronvolts because of exponential suppression arising from 
the Coulomb barrier. The reference rate’ at temperatures below 
1.2 x 10° kelvin relies on extrapolations that ignore the effects of 
possible low-lying resonances. Here we report the measurement of 
the ?C(?C, a,1)7°Ne and *C(°C, po,1)"2Na reaction rates (where 
the subscripts 0 and 1 stand for the ground and first excited states 
of °Ne and Na, respectively) at centre-of-mass energies from 2.7 
to 0.8 megaelectronvolts using the Trojan Horse method’ and 
the deuteron in !4N. The cross-sections deduced exhibit several 
resonances that are responsible for very large increases of the 
reaction rate at relevant temperatures. In particular, around 5 x 10° 
kelvin, the reaction rate is boosted to more than 25 times larger than 
the reference value°. This finding may have implications such as 
lowering the temperatures and densities® required for the ignition 
of carbon burning in massive stars and decreasing the superburst 
ignition depth in accreting neutron stars to reconcile observations 
with theoretical models?. 

We measured the ?C(l4N, a?°Ne)*H and ?C(/4N, p”3Na)?H three- 
body processes in the quasi-free kinematic regime using the Trojan 
Horse Method (THM). The THM is an indirect technique with which 
to measure low-energy nuclear reactions unhindered by the Coulomb 
barrier and free of electron screening. The experimental and analysis 
procedures are detailed in Methods sections “THM basic features; ‘One- 
level many-channel THM formalism; ‘Experimental setup and channel 
selection and ‘Deuteron momentum distribution. The experiment was 
performed at INEN, Laboratori Nazionali del Sud, Italy. A 30-MeV My 
beam accelerated by the MP Tandem accelerator was delivered onto a 
carbon target. The detection setup consisted of two silicon telescopes, 
devoted to the detection of a—d and p—d coincidences. The occurrence 
and the dominance of the quasi-free mechanism* was indicated by the 
agreement between the shapes of the experimental and the theoretical 
d momentum distributions (Extended Data Fig. 1). 

The THM experimental yields projected onto the *C-?C relative 
energy variable, the centre-of-mass energy Eqn, are shown as black dots 
in Fig. la (?°Ne+ a); Fig. 1b (?°Ne+ a1); Fig. 1c (2Na + po) and Fig. 1d 
(?3Na+ pj). A smooth four-body background due to "O+a+a+d 
was subtracted from the THM yields for the 7°Ne + co, channels. Error 
bars display the statistical errors and account for background subtrac- 
tion uncertainty, when applicable, combined in quadrature. 


A modified one-level many-channel R-matrix analysis was 
carried out including the excited states of the “Mg nucleus reported 
in Extended Data Table 1°-'°. The fraction of the total fusion yield 
from «and p channels'*!° other than c,1 and po,; was neglected with 
estimated errors at Egy <2 MeV lower than 1% and 2% for the a and 
p channels, respectively (see Methods section ‘Modified R-matrix 
analysis’). 

The results are shown in Fig. la—d as red lines and with light-red 
shading indicating the uncertainties on the resonance parameters, 
including correlations. Agreement with experimental data is fair and 
confirmed by the reduced 7 (that is, ¥ ?) values of 0.73 for 2°Ne+ co, 
1.06 for ?°Ne + ay, 0.54 for 72Na + pp and 1.34 for °Na +p). The reso- 
nance structure observed in the excitation functions is consistent with 
4M level energies reported in the literature, with some tendency for 
the even-/ states to be clustered" at about 1.5 MeV. The THM-reduced 
widths thus entered a standard R-matrix code! and the S(E) factors 
(see Methods section ‘Astrophysical S(E) factor’) for the four reaction 
channels were determined. 

The results are shown in Fig. 2a (?°Ne + ao), Fig. 2b ?°Ne +1), 
Fig. 2c (77Na + po) and Fig. 2d (72Na + p)), in terms of the modified S(E) 
factor!>-!”, S(E)*, (see Methods section ‘Astrophysical S(E) factor’). The 
black line and grey shading in each panel represent the best-fit curve 
and the range defined by the total uncertainties, respectively. The grey 
shading is the result of R-matrix calculations with lower and upper 
values of the resonance parameters provided by their errors after being 
combined with the normalization one. Excursions from the midline 
range from 11% to 20%. 

The resonant structures are superimposed onto a flat nonresonant 
background’ of 0.4 x 10!° MeV b. Unitarity of the S matrix is guar- 
anteed within the experimental uncertainties. Normalization to direct 
data was done in the Em window 2.50-2.63 MeV of the ?°Ne + «, chan- 
nel, where a sharp resonance corresponding to the 16.5-MeV level? of 
°4Mg appears and available data'>!*~° in this region are the most accu- 
rate of those overlapping with THM data. By scaling to the resonance by 
means of a weighted normalization, the resulting normalization error is 
5%, shown as grey shading in Fig. 2a—d, combined in quadrature with 
errors on the resonance parameters. 

Existing direct data below Ecm=3 MeV are shown as red filled 
circles!’, purple filled squares!®, blue empty diamonds”, blue filled 
stars”? and green filled triangles”! in Fig. 2. Their low-energy limit is 
mostly fixed by the background due to hydrogen contamination in the 
targets'* 7! and the higher S(E) values for the p; channel in some of 
them!*-?! were attributed to Coulomb excitation of “Na contamina- 
tion in the targets or collimators'>”°. Disregarding these cases, agree- 
ment between THM results and direct data are apparent within the 
experimental errors, except for the direct low-energy limit around 
2.14 MeV, where THM data do not confirm the claim of a strong 
resonance; instead, there is a nearby resonance at 2.095 MeV, about one 
order of magnitude less intense in the ?°Ne + a, channel (see Fig. 2b) 
and with similar intensity in the **Na + p; one (see Fig. 2d). The present 
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result is in agreement with spectroscopy studies” that report a dip 


at 2.14 MeV and no particularly strong a state at around 2.1 MeV. 
Further agreement is found with unpublished experimental data down 
to Eqn =2.15 MeV for the C(?C, po,1)"2N reactions”. Our result is 
also consistent within experimental errors with the total S(E)* from 
a recent experiment at higher energies”, which was calculated at the 
overlapping E-m=2.68 + 0.08 MeV. 

The reaction rates for the four processes were calculated from the 
THM S(E)* factors using the standard formula‘ and summed to obtain 
the total *C + °C reaction rate. Its numerical values are given in 
Extended Data Table 2 (see Methods section ‘Numerical values of the 
C+ C reaction rate’). We recommend an analytical expression for 
the reaction rate and for its upper and lower limits, based on the same 
formulae as reported in the REACLIB library”’. This expression is valid 
in the temperature range 0.1 GK < T < 3 GK with an accuracy better 
than 0.7% (Y” =0.1), which refers to the maximum difference between 
the analytical function and the centroids of the experimental points. 
This is given by: 


N, (ov) = ee, a re exp[ay,+4,T '+a,T'”? 
ayl +a, bak +a,ln(7) 


(1) 


Parameters aj with 1<i<3 and 1<j<7 are given in Table 1, with 
subscripts ‘w and T for the upper and lower limits. They result from 
a fit performed using the NUCASTRODATA toolkit (http://www. 
nucastrodata.org/). 

The total THM reaction rate was divided by the reference rate’. The 
resulting ratio is shown in Fig. 3. The black line represents the rate from 
the present work, with the grey shading defining the region fixed by the 
total uncertainty (Methods section ‘Numerical values of the *C + 2C 
reaction rate’), whereas the red line refers to the reference rate”. 

The light-blue shading shows the temperature range relevant for 
superbursts (about 0.4-0.5 GK), the light-red shading highlights typical 
temperatures for hydrostatic carbon burning in massive stars (about 
0.6-1.0 GK in the core and up to 1.2 GK in the shell, depending on the 
stellar mass), whereas the light-green shading marks the temperatures 
of explosive carbon burning (about 1.8-2.5 GK). As shown in Fig. 3, 
the reaction rate changes below 2 GK with an increase with respect 
to the reference non-resonant one from a factor of 1.18 at 1.2 GK 
(“““P <0.001) to a factor of more than 25 at 0.5 GK (*"""P< 0.00001). 
The latter increase, mainly due to the resonances around Ezy = 1.5 MeV, 
supports the conjectured fiducial value? required to reduce the 
theoretical superburst ignition depths in accreting neutron stars by a 
factor of 2 for a range of realistic parameters and core neutrino emissivities. 
This change matches the observationally inferred ignition depths and 
can be translated into an ignition temperature below 0.5 GK, com- 
patible with the calculated crust temperature. In other words, carbon 
burning can trigger superbursts. A similar decrease in temperature is 
obtained by using the crust Urca shell neutrino emissivities”®, recently 
invoked to explain the cooling of the outer neutron star crust, while 
thermally decoupling the surface layers from the deeper crust. Under 
this hypothesis, a revision of current superburst models and predicted 
light curves is required and our finding could represent the missing 
heat source in the standard carbon ignition scenario. 
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Fig. 3 | 1*C + *C reaction rate ratio. Ratio between the total THM 
12C+4!C reaction rate (black line) and the reference one’ (red line). The 
grey shading defines the region spanned owing to the +1o uncertainties. 
The coloured shading marks typical temperature regions for carbon 
burning in different scenarios: light blue for superbursts from accreting 
neutron stars, light red for hydrostatic carbon burning in massive stars 
and light green for explosive carbon burning; comparison with the red line 
(non-resonant assumption) gives “*"P < 0.001 in the region of hydrostatic 
burning and “**"P < 0.00001 at superburst temperatures. 
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In the hydrostatic carbon burning regime, the present rate change 
will lower the temperatures and densities at which '*C ignites in mas- 
sive post-main-sequence stars. We make use of stellar modelling® for 
core carbon burning of a star of 25 solar masses to determine that the 
ignition temperature and density would decrease to 10% and 30% 
respectively. This would reduce the neutrino losses, thus causing the 
carbon burning stage to occur for a lifetime (of the carbon burning 
phase) longer by up to a factor of 70. The new rate would also affect 
abundances of species that are the main fuel for subsequent evolution- 
ary phases. However, such abundances are influenced also by the ratio 
of the « to p yields if it deviates from unity. From the present experi- 
ment, the average value of this ratio is around 2. In particular, at 0.8 GK 
this ratio is 1.6 + 0.4, and it becomes 2.2 + 0.6 at 2 GK. The ?C+C 
rate is also the most important nuclear physics input governing the 
minimum stellar mass Myp required for hydrostatic carbon burning to 
occur. Mup is fundamental to our understanding, for instance, of the 
evolution of supernova progenitors and the white dwarf luminosity 
functions. From the present result, we consider that the present value 
of Myp will not be strongly affected, in contrast to what has been pre- 
dicted’”8 when assuming a much larger increase (up to nine orders 
of magnitude) in the reaction rate, but it is worth noticing that stel- 
lar models are also very sensitive to small changes of this parameter. 
However, a sound evaluation of Myp requires a better understanding 
of the ratio of the initial mass to the final core mass. 

Below 0.4 GK the rate experiences a huge increase by up to a factor 
of 800 owing to the lowest-energy resonances occurring around 
Ecm=1 MeV. It has been conjectured that the existence of such low- 
energy resonances might shift the ignition curve of type Ia 
supernovae to lower central densities*. This should be assessed 
for the various progenitor scenarios. Much additional work is needed 


Table 1 | Coefficients of the analytical function of the 12C + 12C reaction rate using equation (1) 


aij fh fo fg fu 


fou fay fy fo fay 


9.03982 x 101 
—8.35720 
—6.17282 x 10! 
—1.07358 x 10? 

7.20835 x 10! 
—1.38060 x 10! 
—1.91920 x 10! 


3.14593 x 10° 
—2.26169 x 10! 
1.36110 x 10° 
—5.16494 x 10° 
7.85965 x 102 
—1.29447 x 10? 
1.60224 x 108 


6.08741 x 102 
—1.42976 x 10? 
3.43845 x 10? 
—1.11874 x 108 
1.73098 x 10? 
—2.33743 x 10! 
3.60334 x 10? 


3.21570 x 10? 
—0.815182 
3.17671 x 10 
—4,.22173 x 10? 
5.23691 x 10! 
—6.35869 
1.34509 x 102 


2.28056 x 102 
—1.15681 x 10! 
—2.40343 x 10? 
—9.21156 x 10! 

1.25484 x 102 
—3.24417 x 10! 
—1.10961 x 102 


Coefficients of the analytical function (equation (1)) of the 12C+!2C reaction rate and of its upper and lower limits. They result from a fit of the numerical values given in Extended Data Table 2 using the 


aia 1.22657 x10 9.03221 x 10! 2.28039 x 10? 1.22687 x 102 
aj2 0.557112 —8.35888 —1.16039 x 10! 0.557664 

a3 —905657x10! —6.17552x10! -—2.40364x10? —9.05616x10! 
aia —6.83561x10! —1.07514x10% —9.21375x10! -683178x10! 
ais 1.42906x10! 7.20344 x 10! 1.25411 x 102 1.42891 x 10! 
aig —2.43583 —1.37501x10! -3.25984x10! —2.46506 

ai7 9.32623 —1.91793 x10! —1.10903 x 102 9.35304 

reaction rate parameterizer from the NUCASTRODATA toolkit (http://www.nucastrodata.org/). 
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to determine the impact that the new ?C + °C reaction rate will 
have in various astrophysical contexts. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
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METHODS 


THM basic features. The THM is an indirect technique aiming at measuring 
low-energy nuclear reactions unhindered by the Coulomb barrier and free of 
electron screening®’”’. It has been used to study several reactions related to fun- 
damental astrophysical problems*°-**. In the THM, the low-energy cross-section 
of an A(x,b)B reaction is determined by selecting the quasi-free contribution of a 
suitable A(a,bB)s reaction that is measured. In quasi-free kinematics, particle a, 
chosen for its xs cluster structure, is used to transfer the participant cluster x to 
induce the reaction with A, while the other constituent cluster s remains a spectator 
to the A(x,b)B sub-process®. Because the transferred nucleus x is virtual, its energy 
and momentum are not linked by the usual energy-momentum relation for a 
free particle. This gives the A(x,b)B reaction its half-off-the-energy-shell (HOES) 
character. The quasi-free A(a,bB)s reaction can be sketched using a pole diagram 
(see Extended Data Fig. 2) with two vertices referring to a break-up (upper vertex) 
and to the A(x,b)B process (lower vertex). The A +a relative motion takes place 
at an energy above the Coulomb barrier, ensuring that the transfer of particle x 
occurs inside the nuclear field of A without undergoing Coulomb suppression or 
electron screening. However, the A +x reaction takes place at the sub-Coulomb 
relative energy E.m because the excess of energy in the A +a relative motion is 
needed for the break-up of the Trojan Horse nucleus a= (xs). From the principles 
of energy and momentum conservation, we obtain: 
2 


| eel ae ee 


cm 
my, + My 


+B B,. (1) 
my, + My 


with m; and p; the mass and momentum of particle i, ju = mjm;/(m;+ mj) the 
reduced mass of particles i and j, F the compound system (F=A+x=b+B) 
and B,,=ms-+ m , — m, the binding energy of clusters x and s inside a. Eqn can 
vary within a range determined by the momentum of the spectator particle, pg, 
or its emission angle. As for p,, its values should not exceed the theoretical upper 
limit for the relative momentum p,; between x and s (in the laboratory system, 
Pxs=Px=—ps) represented by the on-the-energy-shell bound state wave number 
Kxs= (2}txsBys)". This is the condition for the quasi-free mechanism to be dominant, 
for example, for the HOES cross-section to approach the on-energy-shell cross-section 
minimizing distortions. For the !4N = (!2Cd) system, kxs= 181 MeV c~! (where c 
is the velocity of light), exceeding by far the experimental p, upper limit of about 
80 MeV c_!, which is fixed by the phase space populated in the present experiment. 
In the plane-wave impulse approximation, the three-body cross-section can be 
factorized into two terms corresponding to the vertices of Extended Data Fig. 1 
and given by: 

ai. HOES 
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where KF is a kinematical factor containing the final state phase space factor and 
it is a function of the masses, momenta and angles of the outgoing particles®; 
|®(p,s)|? is the squared Fourier transform of the radial wave function for the x(rxs) 
inter-cluster motion whose functional dependence is fixed by the xs system prop- 


2 
erties; doxa—bB is the HOES cross-section of the binary reaction. 


Ex 


One-level many-channel THM formalism. In the case of a multi-resonance A(x,b)B 
reaction, the so-called modified R-matrix approach has been developed*>”* to 
account for its HOES nature in the extraction of the reduced widths + from the 
THM reaction yield. Because the transferred particle does not obey the mass-shell 
equation, no entrance-channel penetration factor is present, making it possible to 
reach astrophysical energies with no need of extrapolation. Yet the same reduced 
widths appear in the THM and in the on-energy-shell cross-sections, so the ones 
extracted from THM data can be used to determine the direct S(E) factor, without 
HOES effects. For isolated non-interfering resonances, the one-level many-channel 
formula can be used, so that the THM A(x,b)B cross-section in the plane-wave 
impulse approximation*>’ takes the form: 


P x 2 
Z / 7 f2PeMi(p Rya¥in Ver 
Poa c! = NEY) (+1) [i Pra Rea) aac (3) 
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with NF a normalization factor; ky and P, are the exit-channel wave number 
and penetration factor (c’ runs over all exit channels), Ex, and Rx are the x-A 
entrance-channel relative energy and channel radius’ is set to 7.25 fm: 
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where jii(p) is the spherical Bessel function for the 1; wave, 
Pex = 2b (Exa + Bys) / > Bxa is an arbitrary boundary condition chosen to 
reproduce the observable resonance parameters**? and D,(E,,) is the R-matrix 
denominator of one-level multi-channel formulas!”: 


Di(Eqa) = Ey—Exa— Yo (y) (8.—B) -Te 2B. (959 (s) 


with the sum running over all the open channels c; S, and B, are the shift function 
and the boundary condition for channel c and om is the reduced width for the ith 
resonance and c channel, which enters the calculation of the on-energy-shell S(E) 
factor free of electron screening and is not affected by the experimental energy 
resolution. 
Experimental setup and channel selection. A ‘“N beam at 30 MeV was delivered 
onto a carbon target, 100 1g cm~ thick, with a spot size of 1 mm. The silicon tel- 
escopes were made up of a 38-j1m AE-detector and a 1,000-\1m position-sensitive 
E-detector (with intrinsic o resolution quoted as 0.3 mm for the position and about 
0.5% for the energy) to measure the residual energy. They were placed symmetri- 
cally at either side of the beam direction, each covering laboratory angles 8° to 30°, 
and devoted to the detection of «-and-d and p-and-d coincidences. Angular con- 
ditions were selected to maximize the expected quasi-free contribution, fulfilling 
the requirement for the spectator particle d to retain its initial momentum inside 
‘AN. Channel selection was accomplished by gating on the AE-E two-dimensional 
plots to select coincident d and a(p) loci. A typical AE-E spectrum is shown in 
Extended Data Fig. 3, where p, d and a loci are clearly visible. Kinematics were 
reconstructed under the assumption of either a °Ne (for the «+d channel) or a 
?3Na (for the p +d channel) as an undetected particle. The Q-value variable was 
reported as a function of a kinematic variable, such as the energy or the angle of 
any one of the particles involved. In this representation, coincidence events of 
interest should lie on a horizontal line that cuts the Q-value axis at the expected 
value, because the Q-value depends only on the masses of the particles involved. 
A typical spectrum for the present experiment is shown in Extended Data Fig. 4 
for the ?C(4N,«2°Ne)’H reaction, where the Q-value is reported as a function of 
the a detection angle. Two dominant sharp horizontal loci appear, corresponding 
to the ground and first excited states of *°Ne. They are highlighted by blue and red 
solid lines crossing the Q-value axis at —5.65 MeV and —7.28 MeV, respectively. 
This spectrum makes us confident of the quality of the calibration and of the 
correct selection of the reaction channel. Further data analysis was restricted to 
such events. 
Deuteron momentum distribution. The d momentum distribution is a physical 
quantity very sensitive to the reaction mechanism. It keeps the same shape as inside 
AN only if the latter experiences quasi-free break-up. The agreement between the 
shapes of the theoretical and experimental momentum distributions is thus a com- 
pelling signature of occurrence of the quasi-free mechanism®”. To determine the 
d momentum distribution from the coincidence yield, the modulation due to pos- 
sible contributions of *4Mg states has to be removed. This is done over restricted 
ranges of Ex, and O-m of less than 30 keV and 5°, respectively. The kinematic factor 
KE describing the phase space population, is divided out by performing a Monte 
Carlo simulation of the experimental setup with the angular ranges and detection 
thresholds of the experiment. The momentum distribution from the 
2C(MN,ao7°Ne)’H reaction is shown as an example in Extended Data Fig. 1 as 
black filled circles. Data are projected in 8 MeV c_' bins over the momentum axis 
of the detected deuteron, pa, with error bars including statistical errors only. The 
solid black line in the figure represents the theoretical behaviour normalized to 
experimental data. It is obtained from the Woods-Saxon C d bound state potential 
with standard geometrical parameters rp = 1.25 fm, a= 0.65 fm and Vo = 54.428 MeV, 
adjusted to give the experimental ground state '*C,, d binding energy in '4N. 
A fair accordance (Y? = 0.2) shows up, indicating that in the phase space region 
spanned in the present experiment the reaction mainly proceeds through a direct 
C transfer. Thus, the plane-wave impulse approximation factorization of equation 
(2) can be relied on for the present investigation because no distortions are needed 
within experimental errors to describe the transfer process*”. This result agrees 
with previous work*”*! where a strong transfer component is found in similar 
kinematic conditions with d detected at forward angles. We remark that in the 
present experiment the d centre-of-mass angular range is about 11°-50° and the 
coincidence mode triggers event acquisition. In those papers‘), it was taken into 
account that the transferred !°C can be found also in its first excited 2* state at 
4.44 MeV. From angular distribution analysis using a general expression for 
resonance reactions”, there is no evidence in our experimental data of a "°C transfer 
in its first excited 2* state at 4.44 MeV. It turns out that only transfer of !?C in its 
ground state is contributing. This result will be discussed in a future paper. 

From the shape analysis of the momentum distribution, we could estimate the 
possible contribution of reaction mechanisms other than the quasi-free one to the 
extracted experimental yield. In particular, other contributing mechanisms, such as 
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‘compound nucleus’ or ‘multistep transfer’, are represented by an isotropic momen- 
tum distribution as a signature of loss of correlation in the deuteron momentum. 
Thus, a fit of the experimental shape of the momentum distribution was performed 
with a linear combination of the theoretical function for the quasi-free mechanism 
with a constant one, leaving coefficients as free parameters. The covariance matrix 
fit returns a contribution consistent with zero for the constant function within an 
uncertainty of 3% at a 2o level, including correlations. This contribution to the 
overall uncertainty was neglected in the further extraction of cross-sections and 
reaction rates. 

Modified R-matrix analysis. Level energies and J” values were taken from the 
literature*°-!?, In particular, J” assignments were checked to be in agreement for 
the most prominent peaks in the present experiment through angular distribution 
analysis. Because the widths of the “Mg states at the relevant excitation energies are 
smaller than their average spacing* (for example, at Eom © 1.5 MeV, I/D < 0.4, 
with [and D the level widths and spacings, respectively.) and most importantly, 
owing to the smoothing effect of our experimental resolution, interference between 
them was not taken into account. Indeed, the energy resolution removes the problem 
of interference, making our result insensitive to it“. Thanks to the non-overlapping 
nature of the levels involved, integration of the d?a/dE¢mdQgdQem over the 
a(p) emission angle in the centre-of-mass system of the 2C(?C,a9,1)?Ne and 
2C(?C,po.1)N sub-reactions could be easily performed. Because the experimental 
Om range of the sub-reactions is about 140° to 180°, angular distributions outside 
this angular region were calculated by means of the general expression for reso- 
nance reactions”. If one considers the a and p; fractions of the total fusion yield 
observed!" at Eum > 2.8 MeV, the lower limits of the a1 +c and po +p; contribu- 
tions to the total cross-sections from the present experiment at the highest energies 
are 0.85 + 0.07 and 0.68 + 0.06, respectively. However, the number of accessible 
excited states for both ?°Ne and Na already reduces to half while moving from 
Em = 2.8 MeV to 1.5 MeV and the cross-sections for 7°Ne and 77Na excited states 
drop more steeply than those for ground states, owing to the sharper decrease (by 
orders of magnitude) of the corresponding penetration factors. Monitoring the 
decrease of the penetration factors for the relevant states, and according to the 
results! at Eum <3 MeV, the contribution to the total fusion yield from « and p 
channels other than a, and po, was neglected in the modified R-matrix analysis 
within uncertainties at Ean <2 MeV lower than 1% and 2% for the a and p chan- 
nels, respectively. 

For all of the states involved in the procedure, the total widths are known and 

in several cases one of the partial widths (usually the a partial width). Thus, the 
normalization constant NF in equation (3) and the missing partial widths were the 
only free parameters to match the modified R-matrix calculations with the indirect 
data for the four channels. Each calculated cross-section was folded with a Gaussian 
function having o= 30 keV to account for energy resolution, as calculated from the 
beam spot size and divergence, the position-sensitive E-detector intrinsic energy 
and angle resolution, energy and angular straggling in target and dead layers. Total 
and partial widths and related uncertainties resulting from the fit are listed in 
Extended Data Table 2 for all levels entering the calculation. Uncertainties account 
for the error budget affecting experimental data (statistical and from background 
subtraction, when applicable) and correlation among the resonances in the four 
reaction channels and range from 10% to about 20%. 
Astrophysical S(E) factor. This factor is introduced to remove the dominant 
energy dependence of the cross-section between charged particles at astrophysi- 
cal energies that is due to Coulomb barrier penetration. The S(E) factor (in units 
of MeV b) is defined through the relationship: 


S(E) = Eo (E) exp (20) (6) 


where E is the incident energy in the centre-of-mass system, o(E) is the energy- 
dependent cross-section and exp(277) is the inverse of the Gamow factor, with 
the Sommerfeld parameter, 7(E) = Z;Z2a((juc?/2E)"? (where Z;, Z, are the charges 
of the colliding nuclei, a is the fine structure constant, 1 is the reduced mass in 
atomic mass units and c is the velocity of light). 

For s-wave non-resonant reactions, the S(E) factor is nearly independent of 
energy and it is the conventional quantity used to extrapolate to low energies. 


For the C+ °C reaction, it is customary to use the so-called modified S(E) factor, 
S(E)*, which displays resonances more clearly. It is defined as: 


S(E)* = Eo (E) x exp(87.12E'/? + 0.46E) (7) 


where the exponential term is the inverse of the Gamow factor with a correc- 
tion arising from the second term in the Coulomb barrier approximation®. In 
particular, the numerical factor 0.46 is the value of the size factor g= 1/3(M1M/ 
(M, +.M2)R3/2Z,Z>)", with Ro the nuclear separation and M,, M; the masses of 
the colliding nuclei. 

Numerical values of *C + °C reaction rate. Since the total '*C + °C fusion yield 
at Eom < 2.8 MeV is likely to be exhausted by the ag, and po,; channels (see Methods 
section ‘Numerical values of the !*C + °C reaction rate’), we assume that the sum 
of their reaction rates in the Em range investigated here is representative of the 
total one. The numerical values of the total '*C + '*C reaction rate are given in 
Extended Data Table 2 expressed in units of cm? mol! s~! at temperatures of 
T=0.1-3 GK. The lower and upper limits are computed using the total uncertainty 
derived by combining the rate uncertainties in quadrature for the four channels 
investigated. Each channel propagates the uncertainty in the THM S(E)* factor. 
The last column of Extended Data Table 2 shows the exponents of the power of 
ten factor multiplying the three previous columns. 

Data availability. All relevant data are available from the corresponding author 
on reasonable request. Data to calculate the rate ratio in Fig. 3 are included in 
Extended Data Table 2. 

Code availability. We have not made publicly available the code for the modified 
R-matrix calculation because it is not intended for open use. However, it is available 
from the corresponding author upon request. 
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Extended Data Fig. 1 | Deuteron momentum distribution. The experimental distribution D(pq) is shown as filled black circles. Error bars represent 
standard lo uncertainties. The black line represents the theoretical shape (see text for details). 
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Extended Data Fig. 2 | Pole diagram describing the quasi-free mechanism in the A(a,bB)s reaction. The upper vertex refers to the break-up of a and 
the lower vertex shows the A(x,b)B process. Colours help to highlight the role of individual particles in the mechanism. 
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Extended Data Fig. 3 | Typical AE-E spectrum. The strongest loci from the bottom to the top correspond to p, d and a. ADC, analogue-to-digital 
converter. 
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Extended Data Fig. 4 | Q-value as a function of the a detection angle ©, for the '*C(14N,«°Ne)H reaction. Blue and red solid lines cross the 
Q-value axis at —5.65 MeV and —7.28 MeV, highlighting the contributions of the ground and first excited states, respectively. 
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Extended Data Table 1 | Resonance parameters of 24Mg levels entering the R-matrix fit and total plus partial widths resulting from the fit 
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29.3+6.1 10.742.2 
42.8+4.8 4.8+40.9 


6.91+0.95 2.0+40.4 
9.61.1 0.9+0.2 
37.545.4 15.142.7 
22.442.8 4.6+0.8 
0.4940.07 0.18+0.03 
37.9+5.6 13.642.5 
7,341.2 3.340.6 
31.644.3 1.0+0.2 
f2t11, 1.1+0.3 
43.9+4.4 15.242.0 
43.4+7.6 4.0+0.7 
87416 2.340,9 
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30.644.6 9.041.6 
4.8+40.6 0.38+0.07 
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0.019+0.003 


0.31+40.04 
0.7+0.1 


0.032+0.005 
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Oo LeL3 
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0.7+0.1 
2.240.3 
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0.60.08 
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7 AELS 
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52411 
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2.0+0.4 
5.621..1 
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(keV) 
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36.5+3.7 


3.9+0.4 
8.6+40.9 
18.041.9 
15.4+1.6 
0.22+0.02 
16.2+1.7 
2.2+0.2 
18.841.9 
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26.541.8 
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0.40+40.08 
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Quoted uncertainties are +1c. Parameters are the centre-of-mass energy Ecm, the excitation energy E,, the spin and parity J’, the total width em, the ao partial width Do, the a1 partial width I, the po 
partial width Io and the p, partial width Ij1. 
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Extended Data Table 2 | Reaction rate of the #2C + 12C fusion reaction 


Reaction Rate (cm* s* mol’) 


T (GK) Adopted Lower Upper Power 
0.10 2.77 2.77 ZAI -52 
0.11 7.80 7.69 7.91 -50 
0.12 1.71 1.52 1.89 -47 
0.13 4.59 3.55 5.64 -45 
0.14 0.90 0.66 1.14 -42 
0.15 0.97 0.71 1.24 -40 
0.16 6.00 4.33 7.66 -39 
0.18 5.85 4.21 7.49 -36 
0.20 1.46 1.05 1.86 -33 
0.25 3.12 225 3.99 -29 
0.30 2.62 1.89 3.36 -26 
0.35 3.91 2.82 5.01 -24 
0.40 3.05 2.19 3.90 -22 
0.45 1.95 1.40 2.49 -20 
0.50 7.65 5,51 9.80 -19 
0.60 2.49 1.79 3.19 -16 
0.70 1.90 1.37 2.44 -14 
0.80 5.97 4.30 7.63 -13 
0.90 1.05 0.76 1.34 -11 
1.00 1.24 0.91 1.56 -10 
25 1.88 1.53 aide -08 
1.50 1.03 0.93 1.11 -06 
175 2.79 2.69 2.89 -05 
2.00 4.41 4.35 4.47 -04 
2.50 3.34 330 3.35 -02 
3.00 0.86 0.86 0.86 00 


The recommended value, 1a lower and upper limits were computed at T=0.1-3 GK covering the relevant astrophysical region. In the last column, the exponents of the power of ten multiplying 
columns 2, 3 and 4 are given. 
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Magnetic edge states and coherent manipulation of 


graphene nanoribbons 
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Hatef Sadeghi®, Colin J. Lambert®, Akimitsu Narita’, Klaus Miillen? & Lapo Bogani!?* 


Graphene, a single-layer network of carbon atoms, has outstanding 
electrical and mechanical properties’. Graphene ribbons with 
nanometre-scale widths”? (nanoribbons) should exhibit half- 
metallicity* and quantum confinement. Magnetic edges in 
graphene nanoribbons®*® have been studied extensively from a 
theoretical standpoint because their coherent manipulation would 
bea milestone for spintronic’ and quantum computing devices®. 
However, experimental investigations have been hampered because 
nanoribbon edges cannot be produced with atomic precision and 
the graphene terminations that have been proposed are chemically 
unstable®. Here we address both of these problems, by using 
molecular graphene nanoribbons functionalized with stable spin- 
bearing radical groups. We observe the predicted delocalized 
magnetic edge states and test theoretical models of the spin 
dynamics and spin-environment interactions. Comparison with 
a non-graphitized reference material enables us to clearly identify 
the characteristic behaviour of the radical-functionalized graphene 
nanoribbons. We quantify the parameters of spin-orbit coupling, 
define the interaction patterns and determine the spin decoherence 
channels. Even without any optimization, the spin coherence time is 
in the range of microseconds at room temperature, and we perform 
quantum inversion operations between edge and radical spins. 
Our approach provides a way of testing the theory of magnetism in 
graphene nanoribbons experimentally. The coherence times that 
we observe open up encouraging prospects for the use of magnetic 
nanoribbons in quantum spintronic devices. 

Theory predicts that graphene nanoribbons (GNRs) can have mag- 
netic edges* that display ferromagnetism and excellent spin-filtering 
properties’, in addition to interesting quantum-coherence features’. 
However, most GNRs do not have atomically precise edges, and bare 
graphene terminations are very sensitive to chemical modification’, so 
the properties of magnetic edge states—and even whether they exist—is 
still uncertain. Previous results based on microscopy have been largely 
blind to the magnetic effects. We have developed a bottom-up molec- 
ular synthesis, which allows for the fabrication of atomically precise 
GNRs with various structures that can be defined uniquely by the 
shape of molecular precursors!*!!. We have previously synthesized 
pure zigzag GNRs with localized edge states in ultrahigh vacuum, but 
magnetic characterization was challenging owing to chemical instabil- 
ity, so the spin properties of well-defined zigzag GNRs remain largely 
unexplored!) !?, 

We overcome these challenges by injecting a spin density into the 
edge states of stable molecular GNRs synthetized via solution-based 
bottom-up chemical methods, using nitronyl nitroxide (NIT) radi- 
cals!? as magnetic injectors. The advantages of this approach are: that 
the groups introducing a magnetic functionality into the GNRs are well 
known! and display interesting quantum propeties'4; that we do not 
rely on unobserved chemistry to create edge states; that the sample can 
be mass-produced, instead of appearing on only one device; and that 
we can test the classical and quantum spin properties in bulk samples. 


The synthesis of NIT-radical-functionalized GNRs (NIT-GNRs) 
starts with Diels-Alder polymerization of a bromo-functionalized 
tetraphenylcyclopentadienone-based monomer (1), yielding a bromo- 
substituted precursor polymer (2; Fig. 1a). Palladium-catalysed 
cross-coupling of 2 to triphenylphosphine-gold(1)-(NIT-2-ide) yields 
magnetic NIT-polyphenylene, which provides a non-graphitized 
reference material (Fig. 1b). Graphitization of 2 yields the bromo- 
substituted nanoribbons (3), which are magnetically functionalized to 
NIT-GNRs by partial bromine substitution via cross-coupling 
(Fig. 1c)’*. Size-exclusion chromatography of 2 yields an average molec- 
ular weight of 126 kg mol™', which corresponds to an average nano- 
ribbon length of / > 100 nm. Fourier-transform infrared, Raman and 
ultraviolet-visible spectroscopies corroborate the well-defined NIT- 
GNR structure, as in previous reports!°, without an appreciable pres- 
ence of transition-metal magnetic impurities (Supplementary 
Information). 

The unpaired electron of the NIT resides in a x orbital that extends 
over two N-O groups and a C atom and overlaps considerably with 
the x orbitals of the aromatic backbone; the resulting spin density 
is injected efficiently into aromatic substituents (Supplementary 
Information)'*®. Modelling of NIT-GNRs using density functional the- 
ory reveals a sizeable spin density injected into the graphene backbone, 
which creates localized, non-dispersive states and a magnetic disper- 
sive edge state, whereas the spins of the NIT-polyphenylene remain in 
completely localized states (Fig. 1b, c, Supplementary Information). 

We can observe and manipulate the spin states directly using electron 
spin resonance!” (ESR), whereby the spin levels are split by a mag- 
netic field and transitions are induced by microwave absorption. Static 
spectra at different frequencies (Fig. 2a) are reproduced using the spin 
Hamiltonian 


H= Hz + Ayy + Hp + Agy 


where the Zeeman term H, = 1 ,BgS; contains the effect of the mag- 
netic field B on the ith spin S; via the Landé tensor g (with jug the Bohr 
magneton); Hy, =>-, , S;Ainl,is the hyperfine interaction between the 


electron spin S; and the spin of the nth nucleus I,,, mediated by the 


hyperfine interaction Aj; Hp =>; ; §;D,S; is the dipolar coupling, 


with D; = an diag(—1, —1, 2), which contains the vacuum perme- 
ability svg and the spin-spin distance rj; and H,,, = 7 _.;8 JigS;tepresents 
the exchange term produced by the exchange coupling J. The para- 
meters that best reproduce all frequencies are: g = diag(2.0097(5), 
2.0060(4), 2.0026(1)); hyperfine coupling with the '4N atoms Ay = 
diag(0.0, 3(2), 34(2)) MHz, tilted by y = 9° in-plane relative to g; and 
Dy = D, =11.0+0.5 MHzand Dy3 = D3 = Dz =8.5+0.5 MHz for 
the along- and across-edge interactions, respectively (Fig. 2b). Within 
error, the same results are obtained for the radicals on NIT-GNRs: 
g = diag(2.0098(5), 2.0059(5), 2.0026(1)), An = diag(0.0, 5(2), 34(2)), 
D,; = 11.0 40.5 MHz and D; =8.5 + 0.5 MHz. The inter- and intra- 
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Fig. 1 | Functionalized graphene nanoribbons. a, Synthetic path to NIT- 
polyphenylene and NIT-GNRs; spin-bearing radicals are shown in blue. 
b, NIT-polyphenylene and its local density of states E. The energy levels 
(as functions of the wavevector k times the repeating-unit length /) show 
no band structure and spins localized on the NIT groups. c, A NIT-GNR 


edge exchange interactions are J) = —25 + 5 MHz and Jz = 
12 + 3 MHz, in agreement with the sign expected from theoretical 
predictions!’ and Goodenough-Kanamori rules!%. These signals are 
attributed to the spin density that is localized on the NIT radicals. 

In addition to these signals, NIT-GNRs display the predicted edge 
state as a strong feature with uniaxial anisotropy: gj = 2.0024(3), gi = 
2.0041(2). Metallic impurities would produce ESR line widths of tens 
of millitesla, rather than the 1-2 mT observed. Metals and spin-bearing 
defects in the graphene backbone would have a different hyperfine cou- 
pling from the NIT radicals and would not display all of the character- 
istics of NIT radicals, and double electron—electron resonance (DEER) 
experiments would not be possible with randomly placed impurities 
(see below). 

The shape and line width of the ESR signal rule out magnetic impu- 
rities and are consistent with previous indications of delocalized spin 
states!®, providing conclusive evidence for the existence of the edge 
spin states that have long been predicted for graphene nanoribbons*’. 
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and its band structure, showing localized states and spin injection inside 
delocalized edge states. In b and c, densities calculated for different energy 
ranges are depicted as green, blue and orange shaded areas (see also the 
arrows); the blue dashed and red solid lines correspond to local densities of 
spin-up and spin-down states, for a given energy interval. 


Theory predicts that the honeycomb lattice of graphene introduces an 
axial spin-orbit effect, Aso, whereas the breaking of the mirror symme- 
try of the plane produces a Rashba-type transverse term, Aa, yielding 
the Hamiltonian 


Hg = = Agog, S, + Ag (£0,5,—0,S,) 


where + denote the valley degrees of freedom, and S; and 0; are the spin 
and pseudospin Pauli matrices’, respectively. Aso © 15 weV and Ap ~ 
1 peV are extracted by considering that | AE (g—g.)| =2A,, where ge 
is the free-electron value, and perturbation theory is used to account 
for the effect of excited states at energy AE (available from the ab initio 
calculations). This constitutes direct experimental confirmation of 
tight-binding estimates of spin-orbit coupling in graphene”®”! and of 
its suppression compared to carbon nanotubes, as predicted by the 
lattice symmetry and the absence of curvature>”!. These observations, 
together with the fact that the static spectra are largely insensitive to 
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Fig. 2 | Static spectra and magnetic interaction pathways. a, Multi- 
frequency ESR spectra for NIT-polyphenylene (green) and NIT-GNRs 
(red), along with simulations (black), plotted against the magnetic field 
from the edge-state resonance. DEER probe and pump windows are 
labelled as 3; and (3, respectively. b, Interaction pathways for radical spins, 
showing exchange (J, and J>; blue dashed lines) and dipolar (D, and D,; 


exchange interactions, indicate that the NIT-GNRs fall into a very inter- 
esting regime, in which coherent manipulation of the spins is 
possible. 

We therefore explore the quantum spin coherence using time- 
resolved ESR. The quantum evolution of a spin can be represented as a 
movement over the Bloch sphere, with zenith positions corresponding 
to pure |1/2) and |—1/2) states and all of their possible combinations 
mapped on the sphere (Fig. 3a). The spin relaxation time T| represents 
spin flips (vertical displacement), whereas the phase memory time T) 
describes the evolution of the quantum phase information (azimuthal 
movement). We measure 7) using the ‘picket fence’ technique” and Ti, 
(a measure of the dephasing time) using the Hahn-echo decay. We fit 
the decay of the echo signal Y over time 7 with an exponential function, 


Y(7r)= Yye7 27/Tm)" [1+k,sin(2w7+ v,) +k, sin(4w + »,)] 


which includes a signal at zero time Yo, modulation by the environ- 
ment at a frequency w (amplitudes k, and kp for first- and second-order 
effects), the phases yy) and y, and a stretch parameter x (Fig. 3b). We 
always find x = 1, which indicates that the approximation of the relax- 
ation time is good, that successive events are uncorrelated and that 
Tm ®& To, as traditionally defined?>. 

The T; values (roughly 10~° s) validate theoretical predictions”. The 
temperature dependence of 1/T; (Fig. 3c) shows three main regimes: a 
linear one below 25 K, which is characteristic of spin-phonon energy 
transfer; a Raman region between 25 K and 200 K, in which relaxation 
happens via virtual states; and a room-temperature region in which 
local vibrational modes have a role, with the same characteristic energy 
(1,354 cm7!) for NIT-GNR and NIT-polyphenylene, which we tenta- 
tively assign to the N-O stretching mode. Theories of low-temperature 
spin—-phonon relaxation in graphene>”” quantum dots consider a defor- 
mation-potential mechanism, which is active for longitudinal acoustic 
phonons, and a bond-length-change mechanism, which is active for 
transversal and longitudinal acoustic modes. These mechanisms, in 
conjunction with the absence of Van Vleck cancellation”, are predicted 
to generate the linear dependence that we observe here at low fields. 
The other hypothesized mechanism—spin-state admixture’—can be 
ruled out by the temperature and field dependences that we observe 


orange lines) interactions. c, Orientation of the '“N hyperfine channel 
(green), with the lengths of the axes (a,, ay and az) proportional to the 
principal tensor elements (ay is smaller than the width of the axis), and 
orientations of the local g-tensor frame of the radical (blue, gyyr) and that 
of the graphene edge state (orange, geage). 


and by the low value of the Rashba spin-orbit coupling, to which it is 
linked by symmetry selection rules>”°. 

Even without any optimization, NIT-GNRs display Tm, = 0.5 1s at 
room temperature and T,, = 1.1 js at 85 K (Fig. 3c, Supplementary 
Information)—100 times longer than the 12 ns available in spintronic 
devices*’. These high values are probably linked to the efficient sup- 
pression of scattering in atomically regular edges. NIT-GNRs exhibit 
only a slight increase in T,, at lower temperatures, whereas NIT- 
polyphenylene exhibits a minimum at 170 K and a broad maximum at 
60 K, attributable to the progressive freezing of the benzene-benzene o 
bonds in the backbone. Although Ty, for the localized radicals in NIT- 
polyphenylene might be slightly longer, the NIT-GNRs enable us to 
validate theories of spin relaxation in graphene, have an edge state that 
is connected to transport and are promising for quantum operations. 

We now determine the sources of decoherence in NIT-GNRs. 
The modulation of the Hahn-echo amplitude (Fig. 3b) at w/(27) = 
3.6 MHz—a frequency typical of '3C spin—nuclei interactions—suggests 
that hyperfine decoherence channels are important. Electron-electron 
double-resonance-detected nuclear magnetic resonance (EDNMR) 
allows us to deconvolute the different nuclear contributions!® 
(Fig. 4a). “N coupling is dominant, which confirms the analysis of 
continuous-wave spectra, but BCH single quantum transitions, 
MN, 3C double quantum transitions and nitrogen-carbon mixed tran- 
sitions also have important roles. The coupling strength to the °C of 
the graphene backbone (about 10 MHz) is considerably smaller than 
theoretical estimates for confined graphene dots>”®, in which anisotropic, 
Fermi-contact and nucleus-orbital interactions contribute to a total 
'8C hyperfine interaction of about 70 MHz. These couplings suggests 
that nuclei could be used as computational resources”. 

Finally, we consider the coupling between localized spins and the 
edge state. Information about electron-electron interactions is obtained 
by four-pulse DEER (Fig. 4b)'”**, whereby the system is initialized and 
probed at the g, resonance of the radicals and perturbed at the reso- 
nance condition of the edge state. The resulting spectrum displays an 
intriguing slow oscillation that is overlaid by fast ones (Fig. 4c). The fast 
period corresponds to the D, and D, interactions, which are too strong 
for accurate resolution using DEER and are better appreciated via the 
continuous-wave spectra. Slow oscillations correspond to interactions 
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Fig. 3 | Spin-lattice relaxation and spin coherence times. a, Pulse 
sequence used to extract the spin relaxation times and the Bloch-sphere 
representation. A series of 7 pulses (blue) erases the spin polarization (red 
arrow), which recovers after a time T; during the free-evolution interval 
tiny. The spins are then rotated to the x-y plane with a 1/2 pulse (violet) 
and allowed to precess for a time T. A & rotation in the middle of the 
free precession causes an echo signal when the spins regroup (violet bell 
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curve). b, X-band Hahn-echo signal versus the delay time 7 for NIT-GNRs 
and NIT-polyphenylene, at 85 K (black lines). The red and green dashed 
lines are fits to the data, which yield Ty. c, 1/T) (filled circles, left axis) and 
Tm (open circles, right axis) versus temperature for NIT-GNRs (red) and 
NIT-polyphenylene (green), at 9.4 GHz. The solid lines are simulations for 
1/T; (see text); the vertical blue dashed lines separate the different regimes 
of spin-lattice relaxation. 
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Fig. 4 | Hyperfine coupling and multi-spin operability in GNRs. 

a, EDNMR sequence, using a high-turning-angle pulse (HTA) at 
frequency? (the edge-state resonance /3); Fig. 2a) while frequency 1; is 
swept and the free-induction decay (FID) is measured. The spectrum 
for the NIT-GNRs (Q band, 85 K) shows single quantum (SQ), double 
quantum (DQ) and combination frequency (CF) transitions. b, DEER 
sequence, with v set at the localized spin resonance and 12 at the edge 
state (G; and (3) in Fig. 2a), used to determine the spin-spin interactions 
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and to perform quantum inversion operations between the edge and 
localized spins. c, Background-corrected time-domain DEER spectrum for 
NIT-polyphenylene (green) and NIT-GNRs (red). The black line singles 
out the low-frequency interactions. d, Fast Fourier transform (FFT) of 

the DEER signal of NIT-GNRs, showing the interaction energy spectrum 
that is characteristic of two-spin operations. The black line singles out the 
contribution from edges that interact with localized spins (see inset). 
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between localized and edge-state spins yielding a radical—-edge spin 
interaction of 1.5 MHz (Fig. 4d); these oscillations are absent in NIT- 
polyphenylene, in agreement with the lack of edge states. The edge- 
radical spin inversion time that we extract (about 330 ns) is consid- 
erably shorter than T,,, enabling coherent inversion operations using 
graphene edge states and localized spins. 

This finding, in conjunction with recent results on transport on 
molecular nanoribbons, could lead to fascinating possibilities: quantum 
operations can in principle be performed via single-electron transport, 
and the spin states detected electrically, potentially making our NIT- 
GNRs ideal candidates for quantum nanoelectronic devices. The inter- 
action of multiple radical spins with a coherent, delocalized edge state 
could allow a single flowing electron to transmit entanglement along the 
spin ensemble®. Furthermore, such molecular nanoribbons are useful 
for testing fundamental theories of graphene. Our measurements of 
spin-orbit, hyperfine and edge-spin coupling already reveal physics 
that would otherwise be accessible only by overcoming the present 
challenges in studies of the quantum Hall effect at sub-millikelvin tem- 
peratures”®”!, Detailed access to the spin dynamics, together with an 
atomically defined structure, opens up a path to the quantitative analysis 
of spin—phonon interactions in graphene dots. The study of different 
molecular spin injectors and of different aromatic backbones’? (which 
can be used, for example, to modulate the spin coupling) provides the 
foundations for an area of chemistry that mixes molecular magnetism and 
graphene. Environmental effects, such as GNR-GNR or GNR-substrate 
interactions, are an interesting area of future research; calculations 
show, encouragingly, no detrimental effect on the spin density by 
deposition on hexagonal boron nitride (Supplementary Information). 
Because '“N hyperfine coupling is a dominant decoherence channel, 
there is ample room to increase T>, for example, by dynamic nuclear 
spin polarization’, isotopic substitution”? or chemical engineering*”. 
Full investigations of magnetic doping effects and of incomplete edge 
functionalization with radicals is currently underway. We expect the 
findings to reveal GNRs as powerful tools for investigating finite-size 
effects in quantum Heisenberg spin chains!®. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0154-7. 
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METHODS 


Precursors. All chemicals were used as purchased without further purification. All 
reactions dealing with air- or moisture-sensitive compounds were carried out in 
a dry reaction vessel under argon. The starting materials S1, Poly-Br and GNR-Br 
were synthesized by adapting published protocols!>"; the detailed synthesis is 
reported in Supplementary Information. Unless otherwise noted, materials were 
purchased from commercial suppliers (Fluka, Aldrich, Acros, ABCR, Merck and 
others) and used as received without further purification. 

Material characterization. Analytical size-exclusion chromatography (SEC) 
was performed on SDV PSS GPC columns using THF as the eluent at a temper- 
ature of 303 K. Absorbance was determined on a UV S-3702 detector (SOMA) 
at a fixed wavelength of 270 nm. The samples were referenced with respect to 
standard polystyrene (PS) as well as poly(p-phenylene) (PPP) calibration curves. 
Solution UV-Vis absorption spectra were recorded at room temperature on a 
Perkin-Elmer Lambda 900 spectrophotometer. GNR samples were dispersed in 
N-methyl-pyrrolidone (NMP) by using sonication (30 min) in a Branson-1510 
ultrasonicator followed by filtration through 5-|1m polytetrafluoroethylene (PTFE) 
filters. Infrared spectroscopy was measured on a Nicolet 730 FT-IR spectrome- 
ter equipped with an attenuated total reflection (ATR) set-up. The samples were 
deposited as pristine material on the diamond crystal and pressed on it with a 
stamp. Measurements with a scan number of 128 were recorded for each sample 
and the background was subtracted. 

ESR spectroscopy. X-band (about 9.5 GHz) and W-band (about 94 GHz) ESR 
spectra were acquired using a Bruker EleXsys E680. Q-band ESR spectra were 
obtained on a Bruker EleXsys E580. X-band continuous-wave (CW) spectra 
were additionally recorded on a Bruker EMX. The sample temperature was 
maintained with an Oxford Instruments CF9350 cryostat and controlled with an 
Oxford Instruments ITC503. The microwave resonators that we used were Bruker 
ER4118X-MS-3W1 for X-band (Bruker E680), Bruker EN510702 for Q-band and 
Bruker EN600-1021H for W-band measurements. The sample was prepared as lose 
powder in low-background quartz glass tubes. X-band CW measurements were 
calibrated using polystyrene. High-field measurements at Q-band and W-band 
frequencies were calibrated using a MgO standard. 

CW ESR. CW measurements were performed at X-band and Q-band frequencies 
at room temperature. The modulation amplitude was set to 1 G and the modula- 
tion frequency to 100 kHz for X-band and to 50 kHz for Q-band measurements. 
Spectra were simulated using the Matlab package EasySpin*. Errors on the g, A and 
D values are obtained as standard deviations from multiple sets of measurements. 
Errors on J values are estimated from goodness of simulation. 

FID-detected ESR. Free-induction-decay (FID)-detected ESR spectra at W-band 
frequencies were obtained by exciting the spin system with a long x pulse. This 
enables detectable FID signals to persist after the receiver protection switch. By 
integrating the signal against the field, the result comes very close to the integral 
of the corresponding CW ESR experiment**, The measurement was performed at 
85 K to ensure a strong signal. Spectra were simulated using the Matlab package 
EasySpin*?. 

Determination of T;. In the picket-fence pulse sequence (Fig. 3b), which we used to 
determine the spin-lattice relaxation time Tj, we set the 7/2 and 7 pulse durations 
to 16 ns and 32 ns, respectively, and used a four-step phase-cycling procedure. 
The resulting data were fitted with a sum of two exponential functions, with one 
accounting for spin-lattice relaxation and the other for spectral diffusion processes. 
Spectral diffusion arises from electron-electron and electron—-nuclear couplings, 
thus adding another relaxation mechanism. The picket-fence pulse sequence min- 
imizes this contribution. 

Fit of T;. We describe the spin-lattice relaxation by using three main mechanisms, 
which scale differently with temperature T. By considering direct relaxation, 
Raman processes and the excitation of local vibrational modes, we arrive at”: 
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where Ajin, Anam» Aloc denote the rate constants of the three processes, Op is the 
Debye temperature, x is an integration variable and A\,, is the excitation energy 
of a local vibrational mode. For both compounds, we find @p = 200 K, Ajoc = 
3.5 x 107s and Ajgc = 1,950 K (1,354 cm~), in excellent agreement with the 
local excitation of the stretching modes of the N-O bond, as obtained by infrared 


spectroscopy*>. Furthermore, we find Ajin = 12 s-!, Aram = 3.0 x 10°16 s! for 
NIT-polyphenylene and Ajin = 55 8~!, Anam = 2.2 X 107167! for NIT-GNRs. The 
relaxation is affected by electron-electron interaction effects because our samples 
are highly spin-concentrated. This results in a linear region even above 10 K, which 
is not related to direct relaxation”. 

Hahn echo. The Hahn-echo sequence (Fig. 3b) enables the determination of the 
phase memory time T,,,, which represents an upper limit for T>, because several 
effects, such as instantaneous and spectral diffusion, affect the measurement. We 
set the pulse durations to 16 ns for the 7 /2 pulse and 32 ns for the 7 pulse, phased 
the signal and integrated over the second half of the echo. We used a 16-step 
phase-cycling procedure. We identified Tp to be equal to T). 

EDNMR spectroscopy. EDNMR was performed at Q-band frequencies and 85 K. 
Pulse shapes were formed using an arbitrary-waveform generator. The duration 
of the Gaussian-shaped excitation pulse was set to 1,200 ns, which allows for the 
excitation of forbidden nuclear transitions**. The detection 7 pulse after a delay 
of t was set to 400 ns and the FID recorded. The central line around Av = 0 was 
removed by using Voigtian and additional Lorentzian line shapes. 

DEER spectroscopy. DEER measurements were performed at Q-band frequen- 
cies and 85 K. Pulses were created using an arbitrary-waveform generator. The 
durations of the detection pulses were set to 12 ns and 18 ns for NIT-GNRs and 
NIT-polyphenylene, respectively. The duration of the pump pulse (ELDOR pulse) 
was set to 20 ns. We integrated over the resulting echo against the variable time 
delay of the ELDOR pulse. The time-domain DEER data shown were obtained 
after normalization, long-pass filtering (23.5-MHz threshold) and subtraction of 
a three-dimensional background using the Matlab package DeerAnalysis*’. 
Numerical calculations. Theoretical modelling was performed using the 
Gaussian** and SIESTA®’ implementations of density functional theory. For the 
spin density calculations of the two systems shown in Fig. 1, Gaussian was used 
with a ulsda/6-311+-++-g(d,p) functional and basis set and the XQC method for the 
self-consistent reaction field. We found a similar spin density using SIESTA. The 
generalized gradient approximation (GGA) of the exchange and correlation func- 
tional was used with the Perdew-Burke-Ernzerhof parameterization, a double-¢ 
basis set, a real-space grid defined with an equivalent energy cut-off of 150 Ry and 
a force tolerance of less than 10 meV A~!. For the band structure calculation, each 
structure was sampled by a 1 x 1 x 15 Monkhorst-Pack k-point grid. We found 
the stable magnetic state by allowing the system to be spin-polarized. Apart from 
the two edge atoms on the peripheral NIT fivefold rings, atoms were numbered 
such that odd-numbered atoms were connected only to odd-numbered atoms and 
the same for even-numbered ones. We then performed geometry optimization 
of each system by choosing the initial system to have a ferromagnetic configura- 
tion (all spin-up), or an antiferromagnetic configuration in which the odd (even) 
atoms were designated spin-up (spin-down). The total energy per unit cell of the 
antiferromagnetically aligned NIT-GNRs (NIT-polyphenylene) is 272 meV (651 
meV) lower than that of the ferromagnetically aligned ones. The molecular orbit- 
als, band structure and local density of state calculations were obtained using the 
antiferromagnetic spin alignment. 

Data availability. The datasets generated and analysed during this study are avail- 
able from the corresponding author on reasonable request. 
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Approaching the Schottky- Mott limit in van der 
Waals metal-semiconductor junctions 
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Xiangfeng Duan®°* 


The junctions formed at the contact between metallic electrodes 
and semiconductor materials are crucial components of electronic 
and optoelectronic devices!. Metal-semiconductor junctions are 
characterized by an energy barrier known as the Schottky barrier, 
whose height can, in the ideal case, be predicted by the Schottky- 
Mott rule*~“* on the basis of the relative alignment of energy levels. 
Such ideal physics has rarely been experimentally realized, however, 
because of the inevitable chemical disorder and Fermi-level pinning 
at typical metal-semiconductor interfaces”>!*, Here we report the 
creation of van der Waals metal-semiconductor junctions in which 
atomically flat metal thin films are laminated onto two-dimensional 
semiconductors without direct chemical bonding, creating an 
interface that is essentially free from chemical disorder and Fermi- 
level pinning. The Schottky barrier height, which approaches the 
Schottky-Mott limit, is dictated by the work function of the metal 
and is thus highly tunable. By transferring metal films (silver or 
platinum) with a work function that matches the conduction band 
or valence band edges of molybdenum sulfide, we achieve transistors 
with a two-terminal electron mobility at room temperature of 
260 centimetres squared per volt per second and a hole mobility 
of 175 centimetres squared per volt per second. Furthermore, by 
using asymmetric contact pairs with different work functions, we 
demonstrate a silver/molybdenum sulfide/platinum photodiode 
with an open-circuit voltage of 1.02 volts. Our study not only 
experimentally validates the fundamental limit of ideal metal- 
semiconductor junctions but also defines a highly efficient and 
damage-free strategy for metal integration that could be used in 
high-performance electronics and optoelectronics. 

Metal-semiconductor junctions are at the heart of modern electronics 
and optoelectronics. One of the most important parameters for the 
metal-semiconductor junction is the Schottky barrier height (sp), 
an energy barrier for a charge carrier to cross the junction, which can 
fundamentally determine charge transport efficiency and impact device 
performance!”, In an ideal metal-semiconductor junction, sz can be 
well predicted by the Schottky-Mott rule, a law first proposed in the 
1930s and governed by electrostatics in all types of problem that involve 
energy-level alignments**: 


Pn = Py — Xz (1) 


P53) =15 — Py (2) 


where ®,, is the work function of the metal, Xs and Is are the electron 
affinity and ionization potential of the semiconductor, correspondingly, 
and &sz,, and @sp,, are the Schottky barrier heights for electrons and 
holes, respectively. These quantities are the intrinsic properties of the 
isolated materials before they form the junction, and the Schottky-Mott 
model implies that Psp is linearly dependent on the metal work func- 
tion with a slope of unity. 


Experimentally, however, the Schottky—Mott model gives grossly 
incorrect predictions for the Schottky barrier height”: @sp is usually 
insensitive to y, and the Fermi level of the system is typically pinned 
to a nearly fixed position within the semiconductor bandgap, varying 
little with respect to different metals used, as first noted by Bardeen 
in 1947 (ref. °). The strength of Fermi-level pinning (FLP) for a given 
semiconductor can be characterized by the interface S parameter: 


S=|d®y/dhy| (3) 


If S=1, the Schottky—Mott limit is achieved. Unfortunately, S is gener- 
ally far less than unity for most typical semiconductors (approximately 
0.27 for Si and 0.07 for GaAs)!? and the Schottky—Mott limit has not 
been experimentally achieved in traditional metal-semiconductor 
junctions. 

This striking discrepancy between theory and experiment arises 
because the Schottky—Mott model is purely dependent on ideal phys- 
ics and neglects several types of chemical interaction that are hard to 
avoid at the interface of two dissimilar materials. First, owing to the 
termination of the crystal structure and incomplete covalent bonds, 
surface dangling bonds or surface reconstructions lead to surface states 
(Bardeen pinning effect or Shockley-Tamm states) within the semi- 
conductor bandgap and result in FLP at these energy levels’. Second, 
the interface of the contact is rarely an atomically sharp discontinuity 
between the metal and the semiconductor crystal, where chemical 
bonds take place and modify their original energy levels. Chemical 
bonding between the metal and the semiconductor, and their interdif- 
fusion, can also create large strain in both crystal lattices and change 
the band structures, as well as the resulting barrier height®°. Third, 
the typical processes for material integration and device fabrication 
usually lead to additional chemical disorders and defect-induced gap 
states that serve as a reservoir for electrons or holes and therefore pin 
the Fermi level’. For example, ‘high-energy’ metal deposition processes 
usually involve atom or cluster bombardment and strong local heat- 
ing to the contact region, which could damage the crystal lattice at 
or near the interface!, as commonly observed in III-V compound 
semiconductors!!; moreover, the resist development process could 
also leave polymer residue within the interface that causes the overall 
measured barrier height to deviate from the predicted value. Fourth, 
metal-induced gap states are formed in the junction owing to the decay- 
ing metallic wavefunction that penetrate to nanometre depth into the 
semiconductor’. 

Here we report the design and creation of van der Waals (vdW) 
metal-semiconductor junctions, in which metal electrodes with atom- 
ically flat surfaces are pre-fabricated and physically laminated’ onto 
dangling-bond-free two-dimensional (2D) semiconductors without 
direct chemical bonding, avoiding the associated chemical disorder 
and defect-induced gap states. The fabrication process flow is illus- 
trated in Fig. la. Briefly, a series of metal electrodes with various 
work functions are first prepared on a silicon substrate. They can be 
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Fig. 1 | Illustration and structural characterizations of vdW metal- 
semiconductor junctions. a, Schematic illustrations of vdW integration 
of metal-semiconductor junctions: (i) metal deposition on sacrificial 
substrate; (ii) peeling off the metal; (iii) alignment; and (iv) contact 
lamination and probe window opening. b-d, Cross-sectional schematics 
and TEM images of the transferred Au electrode on top of MoS, with 
atomically sharp and clean metal-semiconductor interfaces. e, Optical 
image of the MoS device with transferred electrodes (upper) and with 
the transferred electrodes mechanically released (lower). The underlying 
MoS; layer retains its original shape after physical integration and 
separation of the Au thin-film electrodes, indicating that the transferred 


mechanically released from the silicon and display an atomically flat 
surface, replicating the flat surface of the substrate (see Methods and 
Extended Data Fig. 1 for details). Next, few-layer MoS, flakes (4-20 nm, 
unless otherwise specified) are mechanically exfoliated on top of highly 
doped silicon (pt*) covered with SiO, (300 nm) and poly(methyl 
methacrylate) (PMMA, 170 nm) as dielectrics (Fig. 1a, ii). PMMA 
here functions as a dielectric that is nearly free from trap states'*'® 
and is essential for retaining the nearly intrinsic electronic properties 
of MoS, in contrast to a conventional MoS,/SiO> interface with rich 
trap states (Extended Data Fig. 2). Next, the previously released metal 
electrodes are aligned under a microscope and physically laminated on 
top of the MoS, flake, resulting in an atomically sharp and clean metal- 
semiconductor interface (Fig. 1a, iii). Finally, the PMMA on top of the 
contact pads is removed using standard electron-beam lithography and 
development processes, leaving the exposed metal pads for electrical 
probing and measurements (Fig. 1a, iv). 

This vdW integration of metal thin-film electrodes and 2D semi- 
conductors has several advantages that could overcome the typical FLP 
limitations and lead to an interface approaching the ideal physical model. 
First, in terms of semiconductor surface, the dangling-bond-free surface 
of the 2D semiconductors could eliminate the Shockley-Tamm states 
that dominate a three-dimensional covalent semiconductor surface 
with rich surface dangling bonds or surface reconstructions!” 
Second, the physical transfer of pre-fabricated metal electrodes offers 
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metal-semiconductor interface is free of direct chemical bonding. 

f-h, Cross-sectional schematics and TEM images of conventional 
electron-beam-deposited Au electrodes on top of MoS», where the 
bombardment of the MoS, surface by high-energy Au atoms and clusters 
creates considerable damage to the MoS, surface, producing a glassy 
layer with apparent defects, interface diffusion, chemical bonding and 
atomic disorder. i, Optical image of a MoS, flake with deposited electrodes 
(upper) and with the deposited electrodes mechanical released (lower), 
where the underlying MoS, surface is destroyed while removing the 
deposited electrodes, suggesting direct chemical bonding and strong 
metal-semiconductor interaction in deposited junctions. 


a gentle, ‘low-energy materials integration strategy without conven- 
tional aggressive fabrication processes (for example, lithography or 
deposition) to prevent the creation of defects, residues, strains and the 
associated defect-induced gap states on a dangling-bond-free 2D sem- 
iconductor surface. This can be clearly seen in cross-sectional trans- 
mission electron microscopy (TEM) images, in which the transferred 
metal/MoS, junctions feature an atomically sharp and clean interface 
(Fig. 1lb-d), whereas the deposited metal/MoS, interfaces show consid- 
erable defects, strain, disorder and metal diffusion (Fig. 1f-h). Third, 
the physical contact without direct chemical bonding can greatly 
suppress the interface dipoles and metal-induced gap states!?- 7”. 

To demonstrate the weak vdW interaction at the interface, we 
mechanically separated the transferred metal electrodes from MoS 
after the device fabrication and electrical measurement. The under- 
lying semiconductor retains its original shape without any apparent 
damage (Fig. le). In contrast, the deposited metal electrodes typically 
form strong chemical bonding with the underlying MoS; (such as 
Au-S bonds), generating a glassy layer dominated by interdiffusion 
and strain. When the deposited metal electrodes are mechanically 
peeled”, the underlying MoS, is destroyed at the same time (Fig. 1i). 
The reversible physical integration and isolation of the transferred 
metal-semiconductor junctions are strong indicators of ideal inter- 
faces, where two crystals in intimate contact retain their isolated states 
without direct chemical bonding. 
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Fig. 2 | Transfer characteristics of MoS, transistors with deposited 

and transferred metal electrodes. a—d, Ii.— Vg; transfer curves of MoS, 
transistors with Ag (a), Cu (b), Au (c) and Pt (d) electrodes deposited 

by electron-beam evaporation. We always observe n-type behaviour 
irrespective of the highly distinct work function of the contact metal used, 
which suggests strong Fermi-level pinning near the conduction band edge. 
e-h, Ig.—Vgs transfer curves of MoS, transistors with transferred Ag (e), 


With minimized interface disorder and weak metal-semiconductor 
interaction, the vdW-contacted MoS, transistors exhibit highly tunable 
device characteristics dictated by the metal work functions. Figure 2 
shows the Ig.—Vgs transfer curve of MoS, transistors contacted by a series 
of transferred metals with various work functions. For comparison, 
we also characterized the devices using the same metal electrodes but 
prepared using conventional electron-beam evaporation. In general, 
the MoS, devices contacted by conventional evaporation-deposited 
metals show n-type behaviour regardless of the work function of the 
metal used (Fig. 2a—d), which is consistent with previous studies”4 and 
strongly indicates that the Fermi level is pinned near the conduction 
band edge of MoS. 

By contrast, for devices with transferred metal electrodes, the majority 
carrier type can be systematically tailored from electrons to holes by 
varying the work function of the contact metals (Fig. 2e-h). For example, 
for Ag, which has a low work function (Wag ~ 4.3 eV), well behaved 
n-type transfer curves are observed (Fig. 2e) with a typical metal- 
insulator transition behaviour*>”°, indicating optimized metal- 
semiconductor contact and low electron barrier. Next, for Cu, which has 
a medium work function (Wc, * 4.6 eV), the device exhibits a bipolar 
transfer curve with preferred n-type behaviour (Fig. 2f). Compared 
with Ag-contacted devices, the I4, current is three orders of magni- 
tude smaller at room temperature and decreases exponentially with 
temperature, demonstrating a relatively large n-type Schottky barrier. 
On further increasing the metal work function by using a Au electrode 
(Wau ¥ 5.1 eV), the device exhibits predominantly p-type behaviour 
with a low current level (nanoamperes) (Fig. 2g). The Iu, drops quickly 
with decreasing temperature, suggesting that a large p-type Schottky 
barrier dominates the overall carrier transport. This is in contrast to 
previous devices (and control samples in Fig. 2c) with deposited Au 
electrodes as n-type ohmic contacts to MoS). In those devices, forma- 
tion of Au-S bonding”” dominates the carrier transport with strong 
FLP near the conduction band edge (Fig. 2c). Finally, for transferred Pt, 
with the highest work function (Wp; + 5.6 eV), the device shows well 
behaved p-type characteristics (Fig. 2h) with an ohmic Ig,— Va, output 
curve (Extended Data Fig. 3). When reducing the temperature, p-type 
metal-insulator transition is observed in MoS, suggesting an optimized 
p-type contact with negligible hole barrier. In contrast, the device with 
deposited Pt electrodes exhibits poor n-type behaviour due to FLP near 
the conduction band edge (Fig. 2d), consistent with previous studies”. 


Cu (f), Au (g) and Pt (h) electrodes. The device switches from n-type 

to p-type characteristics with increasing work function of the contact 
electrodes, suggesting highly tunable electron and hole barriers depending 
on the work function of the transferred contact metal used. The bias 
voltage is 100 mV, and the gate dielectric is composed of 300-nm-thick 
SiO, and 170-nm-thick PMMA for all measurements. Evap., evaporated; 
trans., transferred. 


Our results above demonstrate that carrier transport in MoS, transis- 
tors can be systematically tailored by using transferred metal contacts 
with different work functions. To further evaluate the dependence on 
different metals, we have extracted the Schottky barrier height using 
the equation: 


ci 
I, = AA*T? exp|—-—2 (4) 


where I, is the current through the device, A is the junction area, A* 
is the Richardson constant, k is the Boltzmann constant and T is the 
temperature. We note that ®sp here is extracted under the flat-band 
condition (see Methods), in which the tunnelling current across the 
Schottky barrier can be minimized***; detailed description of the 
extraction can be found in Methods and Extended Data Fig. 4. Figure 3 
shows the extracted Schottky barrier height for different metals used in 
our study as a function of the corresponding work functions; the solid 
line is the linear fitting of the results, the slope of which corresponds 
to the interface S parameter. For control devices with deposited metals, 
the extracted S parameter is 0.09, consistent with previous studies” of 
MoS, with S=0.1, confirming strong FLP near the conduction band 
edge at the metal/MoS, interface (largely due to fabrication-induced 
defects and gap states; see Fig. 1f-h). By contrast, for the devices with 
transferred metal electrodes, the @sp value is strongly dependent on the 
metal work functions, and the Schottky barrier type can be tuned from 
electrons to holes. The fitted S parameter is 0.96, approaching the limit 
of the Schottky-Mott law defined by electrostatic energy alignment. S is 
also much larger than the previously reported values’ of 0.27 for Siand 
0.07 for GaAs, indicating a nearly ideal interface between the physically 
transferred metal contact and the dangling-bond-free 2D semiconduc- 
tor surface, in contrast to the inevitable chemical disorder and FLP at 
typical metal-semiconductor interfaces fabricated previously. 

The ability to prepare atomically sharp and atomically clean metal- 
semiconductor interfaces and to tailor the Schottky barrier height 
opens a pathway to overcome the FLP effect that plagues 2D semi- 
conductor devices and to improve their performance. For instance, by 
applying transferred Ag electrodes with a low electron barrier, we have 
fabricated an n-channel MoS, transistor with two-terminal electron 
mobility (j1.) reaching 260 cm? V~' s~!, considerably higher than pre- 
vious reports for two-terminal back-gated MoS, devices”* (Extended 
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Fig. 3 | Experimentally determined Schottky barrier height for 
different transferred metals and evaporated metals. For transferred 
metal electrodes, the majority carrier type and corresponding Schottky 
barrier height is strongly dependent on the metal work function with 

a slope (S = 0.96) approaching unity, suggesting excellent obedience to 
the Schottky—Mott law. With the conventional evaporation-deposited 
metal electrodes, the devices invariably show n-type behaviour with a 
small electron Schottky barrier and a slope S= 0.09, indicating the strong 
pinning effect at the metal-semiconductor interface. 


Data Fig. 5 and Extended Data Table 1) that did not exclude contact 
resistance. By shortening the channel length to 160 nm, we can further 
increase the delivering current density to 0.66 mA m7! (Extended 
Data Fig. 6) at room temperature, comparable to the best reported 
devices using vdW graphene hybrid contacts or strong contact doping 
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Fig. 4 | The Ag~MoS,-Pt MSM photodiode with transferred 
asymmetric Ag-Pt electrodes. a, Schematic illustration and optical 

image of transferred asymmetric Ag and Pt electrodes on MoS). The 

Ag is grounded, and the Pt electrode is used as the drain. b, c, Semi- 
logarithmic plot of the Iy,— V4, output curve of a seven-layer device under a 
dark environment (b) and under 532-nm laser illumination (c, 10 nW m7”). 
The diode demonstrates a high rectification ratio (>10°), near-unity 
ideality factor (7= 1.09) and a large V,.. d, Linear plot of Ias— Vas 
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(Extended Data Table 1). Similarly, by using transferred Pt contacts 
with minimized barrier for holes in MoS:, we have achieved higher 
two-terminal hole mobility (4,) of 175 cm? V~! s~! and hole current 
density of 0.21 mA jum! in p-channel MoS, transistors than in previ- 
ous work (Extended Data Figs. 3 and 6, and Extended Data Table 1). It 
should be noted that although the non-bonding vdW gap may pose an 
additional series tunnelling resistance, its value is too small to affect the 
overall carrier transport (see Methods) and can be largely neglected. 

Taking the method a step further, we can also transfer pairs of 
metals with distinct work functions to enable high-performance 
optoelectronic devices beyond the reach of the traditional processes. 
For example, we have created a metal-semiconductor-metal (MSM) 
photodiode by using a transferred Ag and Pt electrode pair as vdW 
contacts (Fig. 4a). With the asymmetric Ag and Pt contacts, the device 
shows rectification behaviour, with a rectification ratio of up to 108 
and an ideality factor 7 of 1.09 (Fig. 4b and Extended Data Fig. 7). 
The near-unity ideality factor obtained in the transferred MSM diode 
is much better than that of the deposited MSM device with 7 > 1.8 
(Extended Data Fig. 8), confirming the high interfacial quality of the 
vdW metal-semiconductor junctions. 

Under wide-field laser illumination (532 nm, 10 nW jum), our 
Ag-MoS2-Pt MSM photodiodes with transferred contacts produce a 
remarkable open-circuit voltage V,. of 1.02 V in the monolayer MoS 
(bandgap E, ~ 1.8 V) device and 0.76 V in the seven-layer MoS 
(E, & 1.2 V) device (Fig. 4c and Extended Data Fig. 7). The V,, value 
of 0.76 V is more than twice that of a control device with deposited 
Pt-Ag contacts (Extended Data Fig. 8). Overall, the V,. achieved in 
the MSM diode with transferred Ag and Pt contacts is considerably 
higher than those of 2D semiconductor MSM or p-n photodiodes 
reported previously (0.1-0.8 V) (Extended Data Table 2). The lower 
V,- obtained in previous 2D semiconductor photodiodes may be partly 
attributed to the difficulties in achieving a low contact barrier for both 
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output curve under dark (black line) and laser illumination (red line), 
demonstrating a highest V,. of 1.02 V for monolayer MoS, (inset) and 

0.76 V for seven-layer MoS,. The gate voltage is —60 V for the monolayer 
device and —50 V for the seven-layer device. The red dashed lines show 
the corresponding power area for maximum power conversion. The gate 
dielectric is composed of 300-nm-thick SiO, and 170-nm-thick PMMA for 
all devices in optoelectronic measurement. P., electrical power. 
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electrons and holes in the same device. By contrast, with the optimized 
asymmetric vdW contacts in the Ag~MoS:-Pt MSM photodiodes, the 
contact barrier for electrons is minimized at the Ag—MoS, interface, 
and the contact barrier for holes is minimized at the MoS,-Pt inter- 
face, thus ensuring a high V,.. The Ag—MoS,-Pt photodiodes give a 
photoresponsivity of 7.2 mA W7! and 16.6 mA W~', and external 
quantum efficiencies of 1.74% and 4.5% for the monolayer and 
seven-layer devices, respectively, higher than those of previous p-n 
junctions (about 0.2%)*? made by dual-gated WSe2. A maximum 
electrical power output of 0.3 nW (3.7 nW) is obtained at Vin =0.54 V 
(0.5 V) for the monolayer (multilayer) device (Fig. 4d), with a power 
conversion efficiency of 0.2% (0.6%) (see Methods). 

Our study not only validates the fundamental limit of ideal metal- 
semiconductor interfaces, but also defines a general, low-energy metal 
integration approach that may be extended to other delicate materials 
that would be damaged by aggressive contact fabrication processes (for 
example, degradation in various solvents used in lithography processes 
or in ‘high-energy’ metal deposition) or to other functional interfaces or 
junctions (for example, magnetic-semiconductor or superconductor- 
semiconductor junctions) that were previously limited by interface 
disorder. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0129-8. 
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METHODS 


Metal electrode fabrication, release, transferring and lamination process. We 
first prepare a series of 50-nm-thick metal electrodes with various work functions 
on a silicon substrate with an atomically flat surface by using standard photolithog- 
raphy or electron-beam lithography and high-vacuum electron-beam evaporation. 
Next, a hexamethyldisilazane (HMDS) layer is applied to functionalize the whole 
wafer, and a PMMA layer is then spin-coated on top of the metal electrodes. With 
the pre-functionalization by HMDS, the PMMA layer has weak adhesion to the 
sacrificial substrate and can be mechanically released by using a PDMS stamp, 
together with metal electrodes wrapped underneath (Fig. 1a, ii). The metal elec- 
trodes released using this method are atomically flat (replicating the atomically 
flat surface of the sacrificial wafer), with a mean surface roughness of 0.2-0.3 nm 
(Extended Data Fig. 1). 

In the functionalization step, we place the wafer and metal electrodes in a sealed 
HMDS chamber at 120°C for 2-30 min and then spin-coat it with the PMMA 
polymer. Both the required HMDS functionalization time and the PMMA thickness 
are highly dependent on the metal to be released. For metals with weak adhesion 
(Au, Ag, Pt) to the silicon wafer, the functionalization time is 20-30 min and the 
PMMA polymer is about 1 jum thick. For metals with intermediate adhesion force 
(Pd), the treatment time is 5 min and the PMMA polymer is about 2 jum thick. For 
metals with greater adhesion force (Cu), the treatment time is 3 min and the PMMA 
polymer is about 5-10 j1m thick. For metals with the strongest adhesion force (Ti, 
Ni, Cr), the treatment time is <2 min and the PMMA polymer is >10 jim thick. 
However, for Ti, Ni and Cr, the releasing yield is low, and this release method needs 
further improvement. Extended Data Fig. 1 shows optical images and photographs 
taken during the transfer process. For Au electrode transfer, the highest transfer 
temperature (from PDMS to the target substrate) should be kept lower than 60°C, 
to maintain the metal-semiconductor vdW gap and to avoid their strong inter- 
action and formation of chemical bonds. The transfer process is conducted in a 
nitrogen-filled glovebox with low oxygen level (<0.1 p.p.m.). Furthermore, once 
the metal is delaminated from the substrate, it is physically contacted onto the MoS, 
immediately with short exposure time (typically <20 s) to minimize any possible 
surface oxide formation on oxygen-sensitive metals (for example, Cu and Ag). 
Flat-band Schottky barrier extraction. Our extraction of the Schottky 
barrier height is based on a thermionic model under low doping level, as shown 
in Extended Data Fig. 4. At low doping level, below the flat-band voltage (Veg), 
the charge injection to the MoS; channel is mainly through thermionic emission 
with the relationship in equation (4). In this way, the Schottky barrier ®sp can be 
extracted by using Arrhenius plots with the following equation”**!: 


{5} =P ‘ 
T kT 

where c is a constant and gg is the slope (between —1/kT and In(Iy,/T”)) within the 
Arrhenius plots. Using equations (4) and (5), we can extract the Schottky barrier 
height under various gate voltages, as summarized in Extended Data Fig. 4c, f. 

To accurately represent the thermionic emission of the metal-semiconductor 
interface, the Schottky barrier under a flat-band gate voltage (Vg; = Veg) is 
always used in Fig. 3. Below the flat-band voltage (Vg, < Vpg), the device is in 
the subthreshold regime and the channel resistance dominates the carrier trans- 
port. Above the flat-band voltage (V,, > Vr), the contact is highly doped, and a 
superimposed tunnelling current affects the extracted barrier height, resulting 
in apparently smaller sg. In theory, the extracted Schottky barrier ®sp has a 
linear relationship with the gate voltage in the subthreshold regime and gradually 
becomes sublinear above Vex (refs 7°"). In this way, we could use a guided line 
to extract the Vp value and accurately determine the flat-band Schottky barrier 
(Extended Data Fig. 4c,f). 

For the extraction of the Schottky barrier height, a 0.1 V bias voltage (Vag) is 
used. The resulting relationship between Jy, and the diode saturation current can 
be expressed using the typical back-to-back Schottky diode model with a simple 
current continuity equation: 


aVp 
Tp=Txg|€kT — 1 (6) 
—4VR 
Tp = Iyfe kT — 1 (7) 
Ig,=1p=—In (8) 
Vas= Ve+ Vp (9) 


where Jf is the current of the forward-bias diode, V; is the voltage applied to the 
forward-bias diode, Ip is the current of the reverse-bias diode, Va is the voltage 


applied to the reverse-bias diode and J, is the saturation current that we need to 
measure. On the basis of equations (6)-(9) with Vy,=0.1 V, the overall measured 
current Ig; is 96% of Isat, approaching the case for a nearly ideal single diode; this 
indicates the accurate measurement of J, under this condition. Therefore, the 
relative small bias voltage used here not only can minimize the superimposed 
tunnelling current to approach an ideal Schottky barrier based on pure thermionic 
emission but is also large enough for the whole system to be viewed as a single 
diode at the source side. 

Impact of ultra-thin vdW gap for carrier transport. Although the non-bonding 
vdW gap may pose an additional series resistance, its value is too small (0.1 nm 
to 0.15 nm) to affect the overall carrier transport and can be largely neglected. To 
quantitatively verify this argument, we have calculated the tunnelling resistance of 
avdW gap within the metal-semiconductor interface, through a direct tunnelling 
model with the following relationship**: 


es el 
= q 2 4. 2myaw Pp ? qVvaw 
Jy = ——Fraw X exp 1-/1 (10) 
16n he, 3 hqE.aw Pr 


where Jr is the calculated current density, q is the electron charge, h is the reduced 
Planck constant, yp is the distance between the MoS, conduction band and the 
vacuum energy, Fyaw is the electrical field within the vdW gap, myaw is the electron 
mass within the vdW gap and Vyaw is the bias voltage applied across the vdW gap. 

As shown in equation (10), to accurately evaluate the tunnelling resistance, we 
have further determined the vdW gap thickness using both theoretical calcula- 
tions and experimental measurements. In theory, the vdW gap thickness (Tineory) 
between metal and MoS, can be calculated by subtracting the atomic radius from 
its vdW diameter, using the following equations: 


(11) 


where ryaw(s)s Tvdw(m)> Tatom(s)> Tatom(m) are the vdW radius of sulfur, vdW radius 
of the metal, atomic radius of sulfur and atomic radius of the metal, respectively. 
Given the ratom and ryaw values are 0.88 A and 1.8 A for sulfur; 1.35 A and 1.4 A for 
Cu; 1.4 A and 1.63 A for Pd; 1.6 A and 1.72 A for Ag; 1.35 Aand 1.66 A for Au; and 
1.35 A and 1.75 A for Pt, respectively***°, the calculated vdW gaps are 0.10-0.14 
nm for the different metals used. 

Additionally, the vdW gap can be directly determined from the cross-section 
TEM image (Fig. 1d) using the following equation: 


Treory = (Taw(m)— Txtom(m)) ar (Tawi) Tatom(s)) 


Tyaw a Avo—Au — Tau Is — Anos (12) 


where Tyaw is the vdW gap thickness, dyo-au is the measured vdW distance 
between the Au surface plane and the molybdenum surface plane (about 0.53 nm 
as measured from Fig. 1d), ray is the Au atomic radius (0.135 nm), rs is the sulfur 
atomic radius (0.088 nm) and dyjo-s is the centre-to-centre distance between the 
molybdenum surface plane and the sulfur surface plane (0.162 nm). 

The experimentally determined T,aw is about 0.15 nm, consistent with theoret- 
ical expectations (Tiheory © 0.10-0.14 nm). Such a thin tunnelling gap will result in 
a series resistance of around 10~!° Q cm? to 10-8 O cm? according to equation (10), 
which is several orders of magnitude smaller than the typical MoS, contact resist- 
ance*” (about 10-5 O cm? to 10-3 O cm’), and therefore can be largely neglected. 

We note that the T,aw value (0.15 nm) determined for the interface between the 

transferred metal and the semiconductor is comparable to values for other typical 
vdW interfaces. For example, the calculated the T,aw gap between adjacent layers 
of graphene and MoS, is 0.20 nm and 0.15 nm, respectively, using equation (12) 
and the experimentally measured layer-to-layer distance (0.34 nm for graphite and 
0.65 nm for molybdenite)**”. 
Analysis of Ag-MoS,-Pt MSM photodiode. To fabricate an Ag~MoS,-Pt MSM 
photodiode, asymmetric electrode pairs consisting of Pt and Ag are first deposited 
ona sacrificial wafer and then released and physically laminated onto MoS; using 
the transfer method described above. To evaluate the photocurrent generation 
efficiency, we extract the photoresponsivity 


R= T,./ Phaser (13) 


where [,. is the short-circuit current and Piaser is the input laser power. The 
measured R values are 7.2 mA W~! for the monolayer and 16.6 mA W7! for the 
seven-layer MoS, devices. With R determined, we can further extract the external 
quantum efficiency 


EQE = Rhc/ed (14) 


where h, c, e and ) are Planck’s constant, the speed of light, the electron charge and 
the laser wavelength, respectively. As the device produces both large I,. and Voc, 
the electrical power P. can also be extracted. As shown in the dashed rectangular 
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area in Fig. 4d, a maximum electrical output power of 0.3 nW (3.7 nW) is obtained 
at Vi, = 0.54 V (0.5 V) for the monolayer (multilayer) device. For the fill factor 
(FF), defined as the ratio of maximum obtainable power to the product of V,, and 
short-circuit current I,., a value of FF = Pa max/(Voclsc) & 0.26 (0.47) is obtained 
for the monolayer (multi-layer) device. We can now also give an estimate of the 
power conversion efficiency, which is the percentage of the incident light energy 
that is converted into electrical energy, 7) = Pa,max/ Pasen Where 7) = 0.2% and 0.61% 
for the monolayer and seven-layer devices, respectively. 

Data availability. The data that support the findings of the current study are avail- 
able from the corresponding authors upon reasonable request. 
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Extended Data Fig. 1 | Optical images, photographs and electrodes deposited on a SiO; substrate (e), physically released using 
characterization of the transfer process of the metal electrodes. 1-\um-thick PMMA (f), attached on a PDMS (with PMMA) substrate (g), 
a-d, Optical images of Au electrodes deposited on a SiO substrate (a), and transferred onto the target substrate (h). i, Atomic force microscopy 
physically released using 1-j1m-thick PMMA (b), attached to a PDMS image of the bottom side of the transferred electrodes, with a root-mean- 
(with PMMA) substrate (c) and transferred onto the target substrate (d). square surface roughness of 0.26 nm. 


Scale bars, 200 jum in a-d. e-h, The corresponding photographs of Au 
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Extended Data Fig. 2 | Substrate doping effect on MoS. a, Optical image 
of a seven-layer MoS, flake on a SiO. substrate contacted with transferred 
Pt electrodes. Inset, the optical image of MoS, on SiO, before the metal 
contact. Scale bar, 20 jum. b, Ig.—V¢s transfer curve of MoS; transistor on 

a SiO» substrate under various bias voltages of 10 mV (black), 100 mV 
(red), 500 mV (blue) and 1 V (cyan), demonstrating n-type behaviour, 
suggesting the involvement of defect states within the SiO.-MoS, 
interface. c, Optical image of a MoS, flake approximately 15 layers thick 
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ona PMMA substrate, contacted with transferred Pt electrodes. Inset, the 
optical image of MoS; on PMMA before the metal contact. Scale bar, 

20 jum. d, Ijs-Vg5 transfer curve of MoS, transistor encapsulated in PMMA 
under various bias voltages of 10 mV (black), 100 mV (red), 500 mV (blue) 
and 1 V (cyan), demonstrating p-type behaviour, suggesting that the use of 
a PMMA substrate is essential for preventing substrate pinning effects and 
retaining the intrinsic properties of MoS, flakes. All measurements were 
conducted at room temperature in probe stations. 
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Extended Data Fig. 3 | Highest-hole-mobility device using transferred 
Pt as the contact electrodes. a, Optical image of a MoS; flake on a 
PMMA/SiO, substrate. b, Optical image of the MoS, flake after being 
contacted by transferred Pt electrodes. The channel length is 13.5 1m 
and the effective channel width is 8.37 jum. Scale bar in a, b, 10 pm. 

c, Ij;- Vas output curve of the MoS, transistor under various gate voltages 
from —60 V to 60 V. d, e, Linear (d) and semi-logarithmic (e) plot 

of the I4,— Vgs transfer curve of the MoS, transistor under various bias 
voltages: 10 mV (black), 100 mV (red), 500 mV (blue) and 1 V (cyan). 
The purple line is the gate leakage current (I), which is an order of 
magnitude smaller (limited by equipment) than Jy; and will not affect the 
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overall carrier transport. Under large gate voltage, the channel majority 
carrier is inverted to electrons and the carrier concentration is increased 
exponentially, greatly reducing the electron Schottky barrier width. As a 
result, the electrons can tunnel through the thin Schottky barrier from 

the source side, which accounts for the observed ambipolar behaviour. 

f, The extracted two-terminal field-effect hole mobility using various bias 
voltages: 10 mV (black), 100 mV (red), 500 mV (blue), 1V (cyan). The 
width/length ratio is 0.62. The gate dielectric is composed of 300-nm-thick 
SiO, and 170-nm-thick PMMA and is calculated to be 6.2 nF cm~?. The 
highest extracted hole mobility is 175 cm? V~! s~!. All measurements were 
conducted at room temperature in probe stations. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


SB height (meV) 


Ofeee0000? 


-50 -40 -30 


-30 

V,. (V) Vis (V) 
Extended Data Fig. 4 | Flat-band Schottky barrier extraction. shown by the dashed lines. d, e, Ig; Vg; transfer curves of a MoS; transistor 
a, b, Ias— Vg, transfer curves of a MoS, transistor using transferred Ag using transferred Pt electrodes under various temperatures, with the bias 
electrodes under various temperatures, with the bias voltage fixed at voltage fixed at 100 mV. f, The extracted p-type Schottky barrier height 
100 mV. c, The extracted n-type Schottky barrier height at various gate at various gate voltages, where the flat-band hole Schottky barrier is 
voltages, where the flat-band electron Schottky barrier is measured to measured to be 67 mV. The flat-band voltage and corresponding Schottky 
be 20 mV. The flat-band voltage and corresponding Schottky barrier are barrier are shown by the dashed lines. Tran, transferred. 
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Extended Data Fig. 5 | Highest-electron-mobility device using 
transferred Ag as the contact electrodes. a, Optical image of a MoS, flake 
ona PMMA/SiO, substrate. b, Optical image of the MoS, flake after being 
contacted by transferred Ag electrodes. The channel length here is 10 pm 
and the effective channel width is 5.36 jum. Scale bar in a, b, 10 jum. ¢, Ias— Vas 
output curve of the MoS, transistor under various gate voltages from 

—60 V to 60 V. d, e, Linear (d) and semi-logarithmic (e) plot of Ias—Vgs 
transfer curve of the MoS, transistor under various bias voltages: 10 mV 
(black), 100 mV (red), 500 mV (blue) and 1 V (cyan). The purple line is 
the gate leakage current (I,), which is an order of magnitude smaller than 
Igs (limited by equipment) and will not affect the overall carrier transport. 


60 


Under large gate voltage, the channel majority carrier is inverted to holes 
and the carrier concentration is increased exponentially, greatly reducing 
the hole Schottky barrier width. As a result, the holes can tunnel through 
the thin Schottky barrier from the drain side, which accounts for the 
observed ambipolar behaviour. f, The extracted two-terminal field-effect 
electron mobility using various bias voltages: 10 mV (black), 100 mV 
(red), 500 mV (blue) and 1 V (cyan). The width/length ratio is 0.54. 

The gate dielectric is composed of 300-nm-thick SiO, and 170-nm-thick 
PMMA and is calculated to be 6.2 nF cm~*. The highest extracted electron 
mobility is 260 cm? V~' s~!. All measurements are conducted at room 
temperature in probe stations. 
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Extended Data Fig. 6 | Highest n-type current density using transferred 
Ag and p-type current density using transferred Pt as the contact 
electrodes. a—c, Optical image of initial thin BN flake (a), after MoS, has 
been dry-transferred onto BN using an alignment transfer technique (b), 
and the final device with transferred Ag electrodes (c). The channel length 
is about 160 nm and the channel width is about 6 tum. The gate dielectric 
is composed of approximately 5-nm-thick BN flake and 90-nm-thick 

SiO, (rather than the 300-nm-thick SiO, and 170-nm-thick PMMA 
dielectric used previously) for larger gate capacitance and stronger gate 
coupling to ensure the highest driving current. d, e, Iys— Vas output curves 
of the fabricated MoS, transistor under various gate voltages from —40 V 
to 40 V. The highest current density is measured to be 0.66 mA pm7!. 
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f, Ias—Vgs transfer curve of the fabricated MoS, transistor under various 
bias voltages. With increasing bias voltage, the OFF current of the device 
increases owing to the short channel effect. g-i, Optical image of initial 
thin BN flake (g), after MoS, has been dry-transferred onto BN using an 
alignment transfer technique (h), and the final device with transferred Pt 
electrodes (i). The channel length is ~140 nm, the channel width is about 
6 zm and the gate dielectric is composed of approximately10-nm-thick 
BN flake and 90-nm-thick SiO3. j, k, Ig,— Vg, output curve of the fabricated 
MoS; transistor under various gate voltages from 0 V to —40 V. The 
highest current density is measured to be 0.21 mA ppm !. Scale bar in a-c 
and g-i, 10 pm. All measurements were conducted at room temperature in 
probe stations. 
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Extended Data Fig. 7 | Photoresponse of a monolayer MoS, device 

with transferred Ag and Pt asymmetric electrodes. a, Optical image of 
monolayer MoS; mechanically exfoliated on a 170 nm PMMA/300 nm 
SiO» substrate. b, Optical image of the device after Ag and Pt asymmetric 
electrodes are transferred on top of monolayer MoSp. Scale bar in 

a, b, 10 ppm. c, Semi-logarithmic plot of Ij,— Vas output curve under various 
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gate voltages (—60 V to 60 V, 10 V step) under dark conditions. The Pt 

is biased and the Ag is grounded. d, Semi-logarithmic plot of Iy;— Vas 
output curve under various gate voltages (—60 V to 60 V, 10 V step) 

under laser illumination. e, The Ig.— Vas output curve under dark and laser 
illumination, under gate —50 V. The highest open-circuit voltage of 1.02 V 
is observed in monolayer devices. 
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Extended Data Fig. 8 | Photoresponse of a multilayer MoS, device 
with deposited Ag and Pt asymmetric electrodes. a, Optical image of 
the device. Scale bar, 5 jum. b, Semi-logarithmic plot of Ig.— Vas output 
curve under various gate voltages (—60 V to 60 V, 10 V step) under dark 
conditions. The Pt is biased and the Ag is grounded. c, Semi-logarithmic 
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plot of Iy,- Va, output curve under various gate voltage (—60 V to 60 V, 

10 V step) under laser illumination. d, Iy,— V4; output curve under dark 
conditions and laser illumination, under a gate voltage of 10 V. The highest 
open-circuit voltage of about 0.3 V is observed. 
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Extended Data Table 1 | Electrical performance of MoS2 devices 


a pee *Extrinsic mobility | Reference 
om (cm7/V s), RT 

ee ee ee 

Prewireteromes [iar [aoe [Reena [Rio 


Contact phase 1200 0.085 46 (electrons) 
engineering 


Graphene contact ~3000 70 (electrons) Ref. 26 


Metal/graphene vdW 51 (electrons) Ref. 43 
contact* 


p-type 


Presereereona [wo [oat [Tepe [von 


*The channel lengths (Lch) here are for the device with highest current density. 

+The mobility is extracted through measurements with a field effect transistor (FET mobility), except for the Nb doping device, which uses Hall measurement (Hall mobility). 
#The only higher /,, is achieved in an ultra-short n-channel MoSp device with optimized metal/graphene hybrid vdW contact. 

n-type results are from refs, 24264043: 5-type results from refs. 444°, 
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Extended Data Table 2 | Photovoltaic effect in 2D semiconductor-based diodes 


Transferred Ag-MoS2-Ag diode ftiayer 102 This work 
Transferred Ag-MoS2-Ag diode This work 
Evaporated Pd-MoSe-Au diode Ref. 47 
Dual-gate WSez p-n diode Ref. 29 
Dual-gate WSez p-n diode Ref. 30 


MoS2-WSez vertical p-n diode Ref. 48 
MoS2-WSez planar p-n diode Ref. 49 
MoS2-WSez planar p-n diode Ref. 50 
MoS2-BP planar p-n diode Ref. 51 
WSez2-BN-graphene diode Ref. 52 


Graphene-MoS2-graphene ; 
; : 1 layer-multilayer-1 layer | 0.3 Ref. 53 
vertical diode 


Data are from refs. 29394753, 
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The effect of hydration number on the interfacial 


transport of sodium ions 


Jinbo Peng!*!°, Duanyun Cao, Zhili He”, Jing Guo!, Prokop Hapala’, Runze Mal, Bowei Cheng", Ji Chen*, Wen Jun Xie’, 
Xin-Zheng Li>°, Pavel Jelinek*’, Li-Mei Xu!**, Yi Qin Gao**, En-Ge Wang) §* & Ying Jiang!®§* 


Ion hydration and transport at interfaces are relevant to a wide 
range of applied fields and natural processes'~*. Interfacial 
effects are particularly profound in confined geometries such 
as nanometre-sized channels®®, where the mechanisms of ion 
transport in bulk solutions may not apply’. To correlate atomic 
structure with the transport properties of hydrated ions, both the 
interfacial inhomogeneity and the complex competing interactions 
among ions, water and surfaces require detailed molecular-level 
characterization. Here we constructed individual sodium ion 
(Nat) hydrates on a NaCl(001) surface by progressively attaching 
single water molecules (one to five) to the Na* ion using a combined 
scanning tunnelling microscopy and noncontact atomic force 
microscopy system. We found that the Nat ion hydrated with 
three water molecules diffuses orders of magnitude more quickly 
than other ion hydrates. Ab initio calculations revealed that such 
high ion mobility arises from the existence of a metastable state, 
in which the three water molecules around the Na* ion can rotate 
collectively with a rather small energy barrier. This scenario would 
apply even at room temperature according to our classical molecular 
dynamics simulations. Our work suggests that anomalously high 
diffusion rates for specific hydration numbers of ions are generally 
determined by the degree of symmetry match between the hydrates 
and the surface lattice. 

Determination of molecular-level details of hydration processes 
remains a great challenge, both experimentally and theoretically. 
Various spectroscopic techniques have been used to identify the struc- 
ture and dynamics of solvated ions or molecules through vibrational 
fingerprints''-'°. However, all of these techniques suffer from poor 
spatial resolution and the difficulty of spectral assignment. Molecular 
simulations have also become powerful tools with which to investi- 
gate atomic-scale hydration properties”'*’*, but the reliability of the 
results depends critically on many tunable factors!*. Recently, scanning 
probe microscopy has provided an opportunity to probe interfacial 
water at the single-molecule or even submolecular level!*”7, but the 
application to ion hydration systems is not straightforward owing to 
the lack of controlled methods of preparing individual ion hydrates 
on the substrates and the high flexibility of their structures”***, Using 
an ultrahigh-resolution scanning tunnelling microscopy (STM) and 
qPlus”* noncontact atomic force microscopy (AFM) combined system, 
here we studied the hydrated Nat ion, an alkali metal ion abundant in 
natural water and biological solutions. 

The Nat hydrates (Nat-nD20, n= 1-5) were assembled in a con- 
trolled manner by progressively attaching single D2O molecules to 
Na* ions, which were extracted from the NaCl surface with a chlorine 
(Cl)-terminated tip (for detailed procedures, see Methods and 
Supplementary Fig. 1). We found that the barrier for extracting the 
Na* from NaCl was greatly suppressed with the assistance of water”®. 


Figure la—e shows the atomic models, STM/AFM images (acquired 
with a CO tip?”) and AFM simulations of Na*-nD2O clusters (n=1-5). 
From the atomically resolved STM images, it can be clearly seen that 
Na*-D,0, Nat-2D,0, and Na*-3D,0 adsorb at the bridge sites, while 
Na*-4D,0 and Na*-5D,0O adsorb on top of the Cl sites. We found that 
the maximum number of water molecules in the first hydration shell is 
five (see Supplementary Fig. 2). Further addition of water to Nat-5D20 
results in formation of the second and higher hydration shells. 

The AFM images of the ion hydrates were acquired within the 
weak-disturbance region where the high-order electrostatic force is 
dominant’, providing higher resolution and finer details than STM. 
The charge state of Na in the hydrates can be verified by comparing 
the AFM images and simulations (Supplementary Fig. 3). In the AFM 
images, the Na* ion appears as a dark depression, mainly arising from 
the electrostatic attraction between the CO-tip apex and the Na” ion. 
By contrast, the water molecule was imaged as a bright feature (white 
arrow in Fig. la) surrounded by a dark ring (white dashed curve in 
Fig. la), which are ascribed to the negatively charged O atom and the 
positively charged D atom, respectively”!. The ‘standing’ water (that 
is, the molecular plane of the water molecule is perpendicular to the 
surface, in contrast to the flat-lying water molecules) of Na*-3D,0 
(see the white arrow in Fig. 1c) shows a prominent protrusion caused 
by the Pauli repulsion force. The fuzzy feature in the AFM image of 
Na*-D,0 (see the blue arrow in Fig. 1a) may result from the flipping 
of water over Na* in the presence of the tip (for more experimental 
evidence, see Supplementary Fig. 4). The AFM simulations based ona 
molecular mechanics model using a quadrupole tip (Methods) nicely 
reproduce all the experimental images. The comparison between the 
submolecular-resolution AFM image and simulation is important in 
determining the structure of ion hydrates (one example is shown in 
Supplementary Fig. 5). 

Next we explored the transport of those hydrates. To activate their 
diffusion at low temperature (5 K), we used the inelastic electron 
tunnelling technique by injecting ‘hot’ (that is, with larger energy than 
those at the experimental temperature) electrons/holes into the Au 
substrate, which transport along the surface and transfer their energy 
to the hydrates”*”? (Fig. 2a). Figure 2b plots the diffusion probability 
of Na*-3D20 and Na*t-3H20 asa function of the bias voltage. It clearly 
shows a fast increase around + 150 meV (+170 mV), which coincides 
with the bending mode of D2O (H20). Therefore, the diffusion of Nat 
hydrates must have been induced by the vibrational excitation in the 
one-electron process (Fig. 2b, inset). The diffusion direction is almost 
random when using a CO tip. However, the Na* hydrates tend to diffuse 
towards the Cl~ tip, owing to the electrostatic attraction between the 
Nat and the Cl at the tip apex (see Supplementary Fig. 6 for experi- 
mental evidence and theoretical support). This provides a convenient 
way to study the diffusion of hydrates. 
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Geometry 


Fig. 1 | Geometries and high-resolution STM/AFM images of Na* 
hydrates. a—e, The atomic models (the first column shows the side view; 
the second column shows the top view), STM and AFM images (acquired 
with a CO tip) and AFM simulations of Na*-nD.O clusters (n= 1-5), 
respectively. H, O, Cl, Na and Au atoms are denoted as white, red, green, 
purple and yellow spheres, respectively. Square lattices of the NaCl(001) 
surface arising from the Cl” are depicted in the STM/AFM/simulation 
images by dashed grids. The white (red) arrows in a and b denote bright 
protrusions, and the white (red) dashed curves highlight the crooked 
depressions in the AFM images (AFM simulations). The blue arrow in 

a denotes the fuzzy feature arising from the flipping of the water molecule. 
The white (red) arrow in c denotes the standing water in the AFM image 
(AFM simulation). The set points of the STM images (a—e) are V= 100 mV 


To compare the mobility of different Nat hydrates, we adopted 
the following procedures: first, the Cl” tip was positioned away from 
the hydrates at a certain lateral distance (d x the lattice constant of 
NaCl(001), which is 0.39 nm); second, the bias voltage was ramped 
slowly while the tip height was kept constant; and third, the tunnelling 
current experienced a sudden increase at a certain effective bias voltage 
(Vere), signalling that the hydrates had reached the tip (Fig. 2c). We 
found that the as-determined Veg increases as d increases (Fig. 2d). 
The behaviour of V. shows a negligible difference between the positive 
and negative biases, again revealing the critical role of the vibrational 
excitation (Fig. 2d). 

Although the Vag does not simply represent the diffusion barrier and 
is subject to various experimental parameters (Supplementary Fig. 7), 
we can still use this quantity to compare the relative mobility of different 
hydrates in a qualitative way. Figure 2e plots the Vegas a function of d for 
different hydrates. We notice that Na*-3D,O has a much smaller Veg than 
other hydrates. The hydrates never reached the tip for d>2 even when 
the bias increased to 700 meV, except for Na*-3D20. We found that the 
tip may induce large structural change for Nat-4D,0 and Na*-5D,O at 
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and I= 20 pA; V= 150 mV and I=30 pA; V= 100 mV and I= 30 pA; 
V=100mV and I=50 pA; and V= 100 mV and I= 15 pA, respectively. 
The tip heights of experimental (simulated) AFM images (a-e) are 130 pm 
(7.90 A), 80 pm (8.10 A), 100 pm (7.95 A), 100 pm (8.10 A) and 100 pm 
(7.99 A), respectively. The tip height of experimental AFM images is 
referenced to the STM set point on the NaCl surface (100 mV, 50 pA). The 
tip height in simulations is defined as the vertical distance between the 
apex atom of the metal tip and the Na* ion in Na* hydrates. All the AFM 
oscillation amplitudes of experimental and simulated images are 100 pm. 
All the AFM simulations were done with a quadrupole (d,2) tip 
(k=0.75Nm_!, Q=—0.2e, where Q is the magnitude of quadrupole 
charge at the tip apex and e¢ is the elementary charge). The images are 
1.5nm x 1.5nm. 


d=2 (Supplementary Fig. 8), leading to a considerable reduction of Veg. 
Strikingly, the Na*-3D,0 can still diffuse to the tip readily at d=7, with 
a relatively small Ve (about 400 mV) (Fig. 2d). This suggests that the 
Na*-3D.0 may have an unusually small diffusion barrier. 

To gain insights into the diffusion pathway of those hydrates, we 
performed ab initio density functional theory (DFT) calculations 
(Methods). Indeed, the Nat-3H,O has the lowest diffusion barrier 
(below 80 meV) and the potential energy landscape along the path from 
the Cl” bridge to Cl” atop is rather flat, while all the other hydrates 
have barriers well above 200 meV (Fig. 3a). The initial, transition 
and final states are depicted in Fig. 3b. The diffusion of Nat-nH,O 
(n= 1-3) is accompanied with the rotation of water around the Na‘, 
whereas Nat-nH,O (n=4 and 5) follows a translational mode only 
with local rearrangements of water. The translational diffusion barrier 
of Na*-3H,O is almost three times that of the rotational one, while for 
Nat-2H,0 and Na*t-H,0O the barriers of the two modes are nearly the 
same (Supplementary Fig. 9). 

The small diffusion barrier of Na*-3H20 is closely related to the 
existence of a peculiar metastable state (T3 in Fig. 3b), where the Na* 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


a STM tip 
Na* hydrate Cl 
— 
NaCl ‘ 
ees 
© 30 : 
Vets 
NX 
(OC eee ree ee, aoe nee 
—— -bias | 
— +bias 
-150-100 -50 0 50 100 150 
Bias (mV) 

d T T T T 
420 cas 4 
360 a 3 eee ; : 

= 5 

E 300 Rg ! 

- e - @--bias 

>’ 240, x -@-+hbias | 
180] g-" 

2 3 4 5 6 7 
d (x0.39 nm) 


Fig. 2 | Tip-induced diffusion dynamics of Nat hydrates. a, Schematic 
diagram of the Au-mediated inelastic electron excitation of the Nat 
hydrates with the STM tip at a lateral distance of d x the lattice constant 
of NaCl(001), which is 0.39 nm. The flow direction of hot electrons is 
indicated by the green dotted arrow. b, Bias dependence of the diffusion 
probability of Nat-3D,0 and Na*-3H,O with a CO tip at d=4. The 
voltage pulse duration for each event is 1.2 s. The diffusion probability is 
a statistics from 50 events. The threshold bias for DO is indicated by two 
black arrows. The inset shows the current dependence of the diffusion 
rate of Na*-3D2O with a CO tip at d=2 under 170 mV. Error bars of 


is located at the top Cl” site of NaCl, in contrast to the bridge site in the 
initial state. The conversion barrier between the initial and metastable 
states is only about 50 meV. The three H.O molecules of Nat-3H,O 
cannot simultaneously satisfy an optimal adsorption configuration 
either at the bridge site or at the top Cl” site owing to the symmetry 
mismatch with the tetragonal NaCl(001) surface. However, the struc- 
ture of Na*-nH,O (n=1, 2, 4, 5) matches well with NaCl(001), stabliz- 
ing the hydrates at particular sites (bridge or top sites). 

Futhermore, we found that the diffusion of Nat-3H,O is coupled 
with a collective rotation of water in the metastable state (T6 to T11 
in Supplementary Fig. 10). Such a rotation requires the water to 
make only slight adjustments and break minimal bonds with NaCl, 
leading to a small barrier (about 80 meV). However, removing one 
H,O molecule from Nat-3H2O may greatly increase the travelling 
distance of water during the rotation, while adding one HO molecule 
may block the rotational degrees of freedom owing to the perfect 
symmetry match between Na*-4H,O and NaCl(001). Therefore, it is 
the degree of symmetry match between the hydrates and the surface 
that makes the diffusion barrier sensitive to the number of the water 
molecules. 

These STM/AFM experiments were performed only at low tem- 
perature (5K) and the calculated diffusion barriers correspond to 
the ones at 0K. To investigate the surface diffusion at finite temper- 
atures, especially close to room temperature, we carried out classical 
molecular dynamics simulations (Methods). Figure 4a shows the x-y 
diffusion trajectories of Nat-nH,O (n= 1-5) during a period of 200 ns 
at 300 K, showing an extraordinarily high mobility of Na*-3H,0O (also 
see Supplementary Videos 1-5). In the zoom-in image (Fig. 4b), 
two different hopping behaviours were observed. Na*-H2O and 
Na*-2H,O hop between the bridge sites (Supplementary Videos 6 
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the current reflect the standard deviation from the set point. Error bars 

of the diffusion rate reflect the errors of exponential fitting of lifetime 
distribution. The solid line is the least-squares fit to the data with a power 
law, R x IN, where N= 1.02 +0.08, indicating a one-electron process. 

c, Current-bias relationship of Na*-3D.0 witha Cl tip at d=2, where the 
current jumps occur at Vig. d, Lateral distance dependence of the positive 
(red) and negative (black) Veg for Na*-3D20 with a Cl” tip. e, Comparison 
of Vor for Nat-nD2O (n= 1-5) at d=2, 3 and 4. Error bars ind ande 
reflect the standard deviation (up to 8 different datasets). The tip height in 
b-e is —165 pm referenced to the STM set point on NaCl (100 mV, 10 pA). 


and 7), while Na*-4H2O and Na*t-5H2O hop between the top Cl~ 
sites (Supplementary Videos 9 and 10). Interestingly, Nat-3H,O 
exhibits a composite behaviour that contains both hopping patterns 
(Supplementary Video 8). The calculated diffusion mean-square dis- 
placements (MSD) at different temperatures are shown in Fig. 4c, 
revealing that the specific hydration-number effect persists even at 
room temperature. It is striking that the mobility of Na*-3H,O is 
more than one order of magnitude larger than that of other clusters 
at 225 K. We also notice that the diffusion of Nat-3H2O is much faster 
than that of Nat in bulk solution*”. 

Supplementary Fig. 11 and Fig. 4d plot the free-energy landscape 
of different Na* hydrates (Methods). The free-energy minima for 
Na*t-nH20 (n=1, 2) and Nat-nH2O (n =4, 5) are located at the bridge 
sites and the top Cl” sites, respectively (Supplementary Fig. 1la—d). 
By contrast, the free-energy surface of Nat-3H2O shows local min- 
ima at the Cl” sites in addition to the global minima at the bridge 
sites (Fig. 4d); these local minima can greatly facilitate the diffusion by 
truncating the barrier (Supplementary Fig. 11i). From the density dis- 
tributions of Nat and H,0 of the most stable and metastable Nat-3H,O 
(Fig. 4e and f), we can identify two characteristic triangular structures 
(see black dashed triangles and insets, and Supplementary Video 8), 
closely resembling the initial/final and transition (T3) states obtained 
by DFT (Fig. 3b). The triangular structures of metastable Na*-3H.O 
are distributed in four equivalent configurations (see the grey dashed 
triangles in Fig. 4f), arising from the small rotational energy barrier 
(about 80 meV) of the three water molecules around the Nat, which 
is much lower than the translational barrier (about 220 meV) (see 
Supplementary Fig. 9). 

More generally, the specific hydration-number effect observed in 
this work may also exist for other salt ions (Li*, K*, Cl-, and so 
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Fig. 3 | Calculated diffusion barrier of Na* hydrates by DFT. 

a, Energy profiles for the diffusion of Nat-nH,O (n= 1-5). The diffusion 
barriers are compared in the inset. b, Snapshots of Na* hydrates along 
the transition path. The first column, the middle three columns and the 


on), but the hydration number can be different depending on the 
size and the hydration asymmetry of positive and negative ions 
(Supplementary Fig. 12). Therefore, our results point out a new way 
to control the ion transport in nanofluidic systems by interfacial 


Transition states Final states 


last column represent the initial state, transition states and final state of 
Na*t-nH,O (n= 1-5), respectively. The number m in the Tm labels at the 
top left of the images corresponds to the (m+ 1)th data point in a. 


symmetry engineering®”’°. The techniques developed in this work 
can easily be extended to different ions and other hydration systems, 
opening up the possibility of studying various hydration processes 
down to atomic scale. 


a T * T ¥ T is T . b 


200+ 


yf) 


MSD (A2) 


=100 0 
x (A) 


i 
-200 200 


Energy 
(meV) 
>320 


Most stable state cr 


Fig. 4 | Molecular dynamics simulations of the diffusion of Nat 
hydrates at high temperatures. a, x—y trajectories of Na*t-nH,O (n= 1-5) 
during a period of 200 ns at 300 K. b, Zoom-in image of a showing 
different diffusion behaviours. The positions of Na* in two consecutive 
steps are connected by dotted lines. c, MSD in I ns of Na*-nH,O (n= 1-5) 
between 225 K and 300K. Error bars reflect the standard deviation from 
ten different datasets. d, The free-energy landscape experienced by 
Na*-3H,0. It shows global minima at bridge sites (blue arrow) and local 
minima at top Cl” sites (red arrow). The positions of Na* and Cl” of the 
underlying NaCl(001) surface are labelled. e, f, Density distributions of 
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the most stable and metastable Nat-3H,O on the same area of NaCl(001) 
as d at 300K. The blue and red dots represent the positions of Na* in the 
hydrate and of the O atom in H,0, respectively. The position of Na* is 
constrained within an elliptical area centred at the bridge site (semi-major 
axis 0.5 A and semi-minor axis 0.25 A, e) and a circular area centred at the 
top Cl° site (radius 0.25 A, f). For clarity, the three water molecules within 
a representative Nat-3HO molecule are connected with black dashed or 
grey lines. Insets of e and f are snapshots of Na*-3H,O at the bridge and 
top Cl” sites, respectively. The images in d-f are 0.8nm x 0.8 nm; the 
insets in e and fare 1.2nm x 1.2nm. 
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METHODS 


STM/AFM experiments. All the experiments were performed with a combined 
noncontact AFM/STM system (Createc, Germany) at 5 K using a qPlus sensor 
equipped with a tungsten (W) tip (spring constant ky ~ 1,800 N m1, resonance 
frequency fo = 23.7 kHz, and quality factor Q ~ 80,000). The NaCl(001) bilayer film 
was obtained by thermally evaporating NaCl crystals onto a clean Au(111) surface 
at room temperature. To reduce the instability of water molecules induced by the 
vibrational excitation of inelastic electrons, we used deuterated water (D2O) instead 
of H2O in the experiment. The ultrapure DO (Sigma Aldrich, hydrogen-depleted 
was used and further purified under vacuum by several freeze-and-pump cycles 
to remove remaining impurities. The D,O molecules were dosed in situ onto the 
sample surface at 5 K through a dosing tube. Bias voltage refers to the sample 
voltage with respect to the tip. All of the STM topographic images and the AFM 
frequency shift (Af) images were obtained with the CO-terminated tips in con- 
stant-current and constant-height mode, respectively. The CO tip was obtained by 
positioning the tip over a CO molecule on the NaCl island at a set point of 100 mV 
and 20 pA, followed by increasing the bias voltage to 200 mV”!. The controllable 
manipulation of water molecules was achieved with the Cl” -terminated tip at the 
set point V= 10 mV, I= 150pA. The Cl tip was prepared by scanning the NaCl 
surface in closer proximity (below V=5 mV and I=2.5nA) with a bare metal tip 
until the Cl atom hopped onto the tip apex”. The construction of the Nat hydrates 
was done with the Cl tips (for details see Supplementary Fig. 1). 

DFT calculations. DFT calculations were performed using the Vienna ab initio 
simulation package (VASP)*!“2. Projector augmented wave pseudopotentials were 
used with a cut-off energy of 550 eV for the expansion of the electronic wave func- 
tions**. Van der Waals corrections for dispersion forces were considered by using 
the optB86b-vdW functional***°. In our study, the system consisted of a bilayer 
NaCl(001) on top of Au(111) substrate modelled by a four-layer slab if not specif- 
ically mentioned. Similar to ref. 18, a (2 x 2) NaCl(001) unit cell on a slightly 


deformed 6 : 
surface model. The lattice constant for NaCl(001) surface was set to be 3.9 A which 
is the same as in the experiment, and the Au(111) substrate was with a residual 
strain of about 2%. Supercells of this surface model were used to make the error of 
water-image interaction negligible with Monkhorst-Pack k-point meshes of spac- 
ing denser than 27 x 0.0064 A~!. The thickness of the vacuum slab was larger than 
16 A and the dipole correction was applied along the surface normal direction***”, 
The Au substrate and the bottom layer of the NaC] were fixed in simulations. The 
Na in the hydrates was positively charged with the charge nearly identical to the 
Na* of the NaCl substrate, based on the Bader charge analysis**. Similar to ref. 22, 
the Cl--terminated tip was modelled using a three-layer Au pyramid of a [111] 
cleaved face with a Cl atom attached at the end. Energy barriers and paths for the 
diffusion of the hydration clusters were determined using the cNEB method*?””. 
The geometry optimizations and the cNEB calculations were performed with a 
force criterion of 0.01eV A“! and 0.02 eV A“, respectively. The binding energies 
(Ebina) were calculated by subtracting the total energy of the Na* hydrates on the 
NaCl(001)/Au(111) structure from the sum of the energies of the relaxed bare 
NaCl(001)/Au(111) substrate, the gas phase of Na and the corresponding isolated 
water molecules in the gas phase (see Supplementary Fig. 13): 


oS 


superstructure of the Au(111) substrate was constructed as the 


Eping = E[(NaCl(001) /Au(111)) ]+nxE[(H,O. ] 


relaxed gas 


+E[(Na) ,,,] ~E[(NaCl(001) /Au(111)+Na‘-nH,O) ] 


relaxed 
Simulations of AFM images. The Afimages were simulated with a molecular 
mechanics model including the electrostatic force, based on the methods described 
in refs 41,42. We used the following parameters of the flexible probe-particle tip 
model: the effective lateral stiffness k= 0.75 N m7! and effective atomic radius 
R.= 1.661 A. We added a quadrupole-like charge distribution at the tip apex to 
simulate the CO tip”)? for all the AFM simulations. Comparison between the 
AEM images and theoretical simulations reveals that the key to the ultrahigh-res- 
olution imaging lies in probing the weak high-order electrostatic force between the 
quadrupole-like CO-terminated tip and the polar water molecules or ions at large 
tip-sample distances, in clear contrast to traditional high-resolution AFM imaging 
at close distances where Pauli repulsion dominates”’. This weak interaction allows 
the imaging and structural determination of the weakly bonded hydrates without 
inducing any disturbance. The input electrostatic potentials of the Na* hydrates 
on the NaCl(001), employed in AFM simulations, was obtained from DFT calcu- 
lations. Parameters of Lennard-Jones pairwise potentials for all elements are listed 
in Supplementary Table 1. 

Molecular dynamics simulations. All the molecular dynamics simulation results 
shown in the paper were obtained by using polarizable force field parameters™ that 
are developed based on molecular dynamics in electronic continuum theory“*. 
The model allowed reproduction of a range of physical and chemical properties 
of sodium chloride, including the density and the surface tension of pure crystal 


system, the viscosity, the dielectric constant, and also the diffusion coefficient in 
solution®”. 

The polarizable force field we used is based on a pairwise additive potential that 
includes a Coulombic treatment of the electrostatic interactions and a Lennard- 
Jones representation of dispersion—attraction and core repulsion. In this formu- 
lation, the potential energy Ej between any pair of non-bonded atoms (i and j) in 
a system composed of the ions and water molecules is usually expressed as the 
sum of the van der Waals interaction energy Eyaw and the Coulombic interaction 
energy Ecoulombic: namely 


12 
Oj; O:; 
= ij ij 

Ej 4¢; 
tj Vj 


Here, rj is the distance between the two atoms; q; and q; are the point charges 
of the atoms; €p is the permittivity of vacuum; ); and ); describe the Coulombic 
polarizable effect of atoms; oj and ¢j are the distance at which the interparticle 
potential is zero and the well depth of the Lennard-Jones potential, respectively. 
Lorentz-Berteloth combination rules*® were used to describe the van der Waals 
interaction between two different kinds of atoms. Our force field parameters are 
shown in Supplementary Tables 2 and 3. 

To test the effect of force field parameters on the simulated results, we also used a 
non-polarizable force field parameter, where the SPC/E model was used for water*° 
and the ion parameters were taken from Joung and Cheatham”. The results show 
a consistent conclusion with the polarizable force field. 

All classical molecular dynamics simulations were performed using the 
AMBERI4 suite of programs“*. A four-layer NaCl crystal (atom numbers of 
18 x 18 x 4) with a (001) surface was built to support the Nat hydrates. Each sim- 
ulation system was first subjected to 5,000 steps of steepest descent energy minimi- 
zation, followed by 5,000 steps of conjugate gradient optimization. Then, a 100-ps 
molecular dynamics simulation was performed to heat the system up to the target 
temperature, followed by a 10-ns-long normal molecular dynamics simulation 
to further relax the system. After the initial equilibration, we performed 200-ns 
calculations for each system with a time step of 2 fs. The temperature was controlled 
using Langevin dynamics with a collision frequency of 1.0 ps_!. Simulation using 
Nosé-Hoover thermostat yields the same results. The bottom layer of NaCl crystal 
was restrained by the 2,000 kcal (mol A?)~! force constant, and all classical molec- 
ular dynamics simulations were carried out with periodic boundary conditions on 
the crystal plane. The SHAKE algorithm was used to constrain all bonds involving 
hydrogen atoms”. A cut-off of 1.0nm was used for van der Waals interactions. 
A long-range dispersion correction based on an analytical integral assuming an 
isotropic, uniform bulk particle distribution beyond the cut-off was added to the 
van der Waals energy and pressure*®. 

Owing to the limitations of our computational ability and the inherently sto- 
chastic property of diffusion calculated for small numbers of atoms, it is very 
difficult to obtain accurate diffusion coefficients (D) for different hydrates. By 
contrast, bulk calculations sample the trajectories of many more molecules in a 
more homogeneous environment compared to the current calculations of a cluster 
on the crystal surface, and are thus faster in yielding converged D values. Instead, 
we simply used MSD for every nanosecond (up to 20 ns) to compare the mobility 
of different hydrates. We took the average of MSD from ten different sets of 20-ns 
data and the error bar reflects the standard deviation. 

The equilibrium fractional population distribution of Nat hydrates at different 
sites follows a Boltzmann distribution. The equilibrium ratio of state i is 


N. eo Ei/ RT 


a 


Nootal 


Neotal e Ex/RT 
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where e is Euler’s constant and £; is the relative energy of the ith state to the min- 
imum energy state. R is the molar ideal gas constant and T is the temperature. 
At room temperature, the configurations of hydrates were fully sampled. Thus 
we used the equilibrium ratio to calculate the free energy landscape of different 
states, which is 


AG=—RT le 


total 


The water orientational time correlation functions C(t) were calculated as 
C(t) = <P2[[wat(0)-fwat(t)] >, where P, is the second-order Legendre polynomial, 
and /twat(t) is the direction vector of the water dipole at time t. 

Data availability. The data that support the findings of this study are available 
from the corresponding author on reasonable request. 
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The origin of squamates revealed by a Middle 
Triassic lizard from the Italian Alps 


Tiago R. SimGes!*, Michael W. Caldwell!*, Mateusz Talanda’, Massimo Bernardi*», Alessandro Palci®, Oksana Vernygoral, 


Federico Bernardini”’®, Lucia Mancini? & Randall L. Nydam!° 


Modern squamates (lizards, snakes and amphisbaenians) are the 
world’s most diverse group of tetrapods along with birds’ and have 
a long evolutionary history, with the oldest known fossils dating 
from the Middle Jurassic period—168 million years ago*~*. The 
evolutionary origin of squamates is contentious because of several 
issues: (1) a fossil gap of approximately 70 million years exists 
between the oldest known fossils and their estimated origin®~’; 
(2) limited sampling of squamates in reptile phylogenies; and 
(3) conflicts between morphological and molecular hypotheses 
regarding the origin of crown squamates®®”. Here we shed light 
on these problems by using high-resolution microfocus X-ray 
computed tomography data from the articulated fossil reptile 
Megachirella wachtleri (Middle Triassic period, Italian Alps'). We 
also present a phylogenetic dataset, combining fossils and extant 
taxa, and morphological and molecular data. We analysed this 
dataset under different optimality criteria to assess diapsid reptile 
relationships and the origins of squamates. Our results re-shape 
the diapsid phylogeny and present evidence that M. wachtleri is 
the oldest known stem squamate. Megachirella is 75 million years 
older than the previously known oldest squamate fossils, partially 
filling the fossil gap in the origin of lizards, and indicates a more 
gradual acquisition of squamatan features in diapsid evolution 
than previously thought. For the first time, to our knowledge, 
morphological and molecular data are in agreement regarding 
early squamate evolution, with geckoes—and not iguanians—as the 
earliest crown clade squamates. Divergence time estimates using 
relaxed combined morphological and molecular clocks show that 
lepidosaurs and most other diapsids originated before the Permian/ 
Triassic extinction event, indicating that the Triassic was a period of 
radiation, not origin, for several diapsid lineages. 

Megachirella preserves traits that indicate that it is a lepidosaurian 
reptile, such as the presence of a well-developed quadrate conch, an 
ectepicondylar foramen in the humerus and pleurodont dentition. 
Some of these features led previous authors to recognize the lepido- 
sauromorph affinities of Megachirella, which was previously considered 
as a non-squamate lepidosauromorph, although no definitive conclu- 
sions on its phylogenetic placement had ever been reached*!°"". Yet, 
the unique condition of Megachirella as one of the very few articulated 
and well-preserved Triassic lepidosauromorphs hints at its potential to 
help resolve important aspects of lepidosaur evolution. Here we provide 
substantial new information on Megachirella, based on personal obser- 
vations and high-resolution microfocus X-ray computed tomography 
(micro-CT) scans, which reveal several previously unnoticed features 
in Megachirella (Fig. 1, Extended Data Figs. 1, 2 and Supplementary 
Discussion). Results from the micro-CT scans include a combination 
of features that are found uniquely in squamates: a triradiate squamosal 
(not tetraradiate as in most other diapsids, including rhynchoce- 
phalians); the squamosal lacks an anteriorly concave articulatory facet 


for the postorbital; a well-developed alar process of the prootic; a well- 
developed radial condyle on the humerus; an ulnar patella; a secondary 
curvature of the clavicles; and an expanded epiphysis of the first met- 
acarpal along with the absence of the first distal carpal (suggesting its 
fusion with the first metacarpal, as observed in modern squamates’”). 
Finally, the micro-CT scans indicate that Megachirella has features that 
are absent in all rhynchocephalians (the sister lineage to squamates), 
including the earliest forms such as Gephyrosaurus: the presence of 
a splenial; the ectopterygoids are directed anteriorly (not laterally as 
in rhynchocephalians); the presacral pleurocentra lack a notochordal 
canal; and dorsal (coronoid) expansion of the surangular and dentary 
bones is absent. The new information presented here, along with our 
extensive revision of diapsid and early squamate phylogeny, unam- 
biguously resolves the placement of Megachirella as the oldest known 
squamate. As expected for a squamate that is 85 million years (Myr) 
older than the oldest previously known articulated squamates for which 
the osteology is well known—Eichstaettisaurus and Ardeosaurus from 
the Late Jurassic of Germany*!?—Megachirella retains numerous 
plesiomorphic features. These features are observed in other diapsid 
reptiles, and some are retained in rhynchocephalians, but they are 
almost entirely lost in crown squamates. These include amphicoelic 
vertebrae (although present in geckoes and Huehuecuetzpalli), a small 
quadratojugal, gastralia and an entepicondylar foramen in the humerus. 

Assessing the phylogenetic position of Megachirella and other lepido- 
sauromorph reptiles is challenging because there has never been a phy- 
logenetic dataset comprising a rich sampling of both non-lepidosaurian 
diapsid reptiles and squamates. Almost invariably, broad-scale reptile 
phylogenies have represented the nearly 10,000 extant species and 
the hundreds of fossil species of squamates as a single operational 
taxonomic unit'*!° (for more examples, see Supplementary Methods). 
This approach oversimplifies the enormous diversity of phenotypes 
and genotypes in squamates. Conversely, studies focused on squamate 
phylogeny never include more than a few taxa outside the Squamata 
to serve as outgroups”!”. Here we create the first morphological phylo- 
genetic dataset comprising all the main branches of the diapsid tree of 
life, including extant taxa and fossils from all major lineages of rhyn- 
chocephalians (for example, tuataras) and squamates at the species level 
(Supplementary Data 1-5). We also focused on primary data collection, 
personally observing numerous specimens covering 100% of the taxa 
included in this dataset. We performed a meticulous revision of reptile 
and squamate phylogenetic characters (and created new characters) 
to avoid issues caused by logical or biological biases in morphological 
characters!®. Owing to the rich sampling of extant squamate species, we 
also included molecular data from 16 loci (13 nuclear and 3 mitochon- 
drial). The analyses performed include morphological and combined 
evidence (morphological and molecular data) analyses of diapsid and 
lepidosaurian relationships, carried out under multiple phylogenetic 
inference methods (see Methods). 
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Fig. 1 | Holotype of M. wachtleri (PZO 628). a, b, Whole skeleton 

dorsal and ventral views. c, d, Skull in dorsal (c) and ventral (d) views. 

e, Palatal region in ventral view. f, Braincase in left lateral view. g, Dentary 
in cross-section. h, i, Right forelimb in dorsal (h) and ventral (i) views. 
Abbreviations: Al.Cr., prootic alar crest; A.Sc.C., anterior semicircular 
canal; Ax, axis; Boc, basioccipital; Bptg.Pr., basipterygoid process; 

Bsp, basisphenoid; C, coronoid; Cap, capitulum; Cb, ceratobranchial; 
Ce.R., cervical rib; Ce.V., cervical vertebrae; Cl, clavicle; Co, coracoid; 
C.P,, cultriform process; Cr.D., crista dorsalis; D, dentary; Do.V., dorsal 
vertebrae; D.T., dentary teeth; Ect, ectopterygoid; Ect.Fr., ectepicondylar 


Despite the difference in the datasets used (that is, morphology 
versus combined evidence) and phylogenetic optimality criteria, all 
results converge on Megachirella representing a stem squamate along 
with Marmoretta oxoniensis, from the Middle Jurassic of Britain, and 
Huehuecuetzpalli mixtecus, from the Early Cretaceous period of Mexico 
(Fig. 2 and Extended Data Figs. 3-8). This resolution is particularly well 
supported in the combined evidence analysis, in which Megachirella has 
aleaf stability above the overall mean (Extended Data Fig. 9). In analy- 
ses with maximum parsimony, Sophineta cracoviensis also falls within 
the Squamata stem, but this is not recovered in the remaining analyses. 
This indicates that some taxa previously proposed to be early-evolving 
lepidosauromorphs (for example, Megachirella and Marmoretta)>!™'! 
actually represent the oldest known squamates, partially filling the sup- 
posed 70-Myr fossil gap in the early history of the clade. Other taxa 
also considered to be early lepidosauromorphs by previous studies (for 
example, kuehneosaurids and Saurosternon”) are consistently found 
in our results to be nested in other parts of the diapsid tree outside 
the Lepidosauromorpha. Additionally, all previous morphology- and 
molecular-based squamate phylogenies available in the literature dis- 
agree with each other concerning the earliest-evolving crown group 
squamates: iguanians for morphology-based analyses”, but dibamids 
and gekkotans for molecular analyses”?! (see also Supplementary 
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Q Pro Bsp Cb Ptg Spl 


Ect.Fr. 


foramen; Ent.Fr., entepicondylar foramen; F, frontal; H, humerus; 

J, jugal; La. W., labial wall; Li.Cr., lingual crest; M, maxilla; M.C., medial 
centrale; McI, metacarpal I; P, parietal; Opi, opisthotics; Pal, palatine; 

POF, postorbitofrontal; POP, paraoccipital process; PrF; prefrontal; Pro, 
prootic; Ptg, pterygoid; Ptg.Q.Pr, pterygoid quadrate process; Ptg.T.R., 
pterygoid tooth rows; Ptg.Tr.Pr., pterygoid transverse process; Q, quadrate; 
Ra, radius; RAP, retroarticular process; Sca, scapula; Spl, splenial; 

Sq, squamosal; Ul, ulna; UL-P., ulnar patella. Scale bars, 10 mm (a, b), 5mm 
(c-f, h, i) and 1 mm (g). 


Methods). The results of the combined evidence analyses typically 
match those of the molecular data alone*?; however, our results show 
unprecedented agreement between morphological and molecular data, 
in placing geckoes instead of iguanians among the earliest-evolving 
squamates. Iguanians are consistently found further crownward in the 
tree, nested either with anguimorphs and snakes (clade Toxicofera, 
Extended Data Figs. 3, 5-8), or with teiioids (Extended Data Fig. 4). 
This unprecedented agreement between molecular and morphological 
data with regards to the early evolution of squamates might be a conse- 
quence of our broad sampling of taxa outside squamates (thus affecting 
character polarity and branch length parameters) and strict criteria for 
morphological dataset construction. 

Megachirella provides unique insights into the early acquisition of 
squamatan features, as it is the first unequivocal squamate from the 
Triassic. Megachirella, and also Huehuecuetzpalli*’, show that fea- 
tures that are commonly attributed to squamates characterize crown 
squamates, but were not yet present in stem squamates. For instance, 
Megachirella and Huehuecuetzpalli still retain amphicoelic vertebrae, 
an entepicondylar foramen, and lack a ball-like distal epiphysis of 
the ulna. Megachirella further indicates that the loss of the quadra- 
tojugal and gastralia occurred within squamates, and not at the point 
of divergence from rhynchocephalians. The same pattern occurs in 
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Fig. 2 | Combined evidence relaxed-clock Bayesian inference analysis 
with total evidence tip and node dating using the fossilized birth-death 


tree model. Summary of the majority rule consensus tree depicting the 
median divergence time estimates for the major diapsid and squamate 


rhynchocephalians, for which Triassic and Early Jurassic fossils were 
previously known’, and which retain plesiomorphic features (such as 
the pleurodont dentition) that are absent in most of the later members 
of that group. 

Previous molecular-clock estimates have placed the squamate crown 
divergence time between the Late Triassic and Early Jurassic®”4, and 
lepidosaurs originating at some point in the Triassic®® or the Middle 
Permian period””*. Our time-calibrated Bayesian inference analyses 
combine information from both the molecular and morphological 
relaxed-clocks on lepidosaurs and other diapsid lineages (Fig. 2 and 
Extended Data Fig. 8), providing a more holistic approach to the diver- 
gence time of squamates, lepidosaurs and other diapsids. Our estimates 
indicate lepidosaurs originated 269 Myr ago (median estimate) in the 
Middle Permian, and crown squamates 206 Myr ago in the Late Triassic 
(thus agreeing with recent phylogenomic analyses’). Furthermore, our 
morphological sampling allows a more precise estimate of the origin of 
the squamate root by the inclusion of fossils now recognized as stem 
squamates, and thus the age of origin of all squamates can be set at 
257 Myr ago, close to the Permian/Triassic mass extinction (PTME). 

Some of the oldest known fossils for certain diapsid lineages are 
known from the earliest Triassic, including ichthyosaurs'®, saurop- 
terygians*® and archosaurs”’, with more recent fossil evidence already 
suggesting the presence of archosauriforms in the Late Permian’®, 
strongly suggesting their divergence preceded the PTME. In accordance, 
our divergence time estimates for almost all major diapsid lineages 
(such as lepidosaurs, archosauriforms and marine reptiles) are in the 
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Permian (Fig. 2 and Extended Data Fig. 8) and not the Triassic 
(the period from which their oldest known fossils are known). This 
corresponds to the general expectation that the oldest known fossil of 
a lineage is likely to be much younger than the actual divergence time 
for that same lineage”. 

The origin of lepidosaurs and other major diapsid lineages before 
the PTME contradicts previous ideas suggesting that those groups 
originated in the aftermath of the greatest mass extinction in Earth’s 
history*”. Instead, our results indicate those lineages already existed, but 
radiated in the Triassic. It is likely that the PTME opened new niches 
and opportunities to lineages previously restricted in diversity, thus 
enabling their radiation in the Triassic into numerous forms and sizes, 
occupying all major biomes on the planet. 
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METHODS 
Micro-CT. The holotype of Megachirella wachtleri was analysed by micro-CT 


at the Multidisciplinary Laboratory of the Abdus Salam International Centre of 
Theoretical Physics (Trieste, Italy), using a system specifically designed in collab- 
oration with Elettra-Sincrotrone Trieste (Basovizza, Italy) for the study of palaeon- 
tological and archaeological materials*’. The micro-CT acquisition of the complete 
specimen was carried out by using a sealed X-ray source (Hamamatsu L8121-03) 
at a voltage of 150kV, a current of 100|1A and with a focal spot size of 201m. The 
X-ray beam was filtered by a 1.5-mm-thick aluminium absorber. A set of 2,400 pro- 
jections of the sample were recorded over a total scan angle of 360° by a flat panel 
detector (Hamamatsu C7942SK-25) with an exposure time of 2.0s. The resulting 
micro-CT slices were reconstructed in 16-bit format using the commercial soft- 
ware DigiXCT (DIGISENS) and an isotropic voxel size of 42.51 1m. Additionally, 
the proximal part of the sample was re-analysed (voltage 150kV, current 100A, 
1-mm copper filter, exposure time/projection 3.0s and 1,800 projections over 360°) 
setting an effective pixel size of 18 |1m and reconstructed using the same software 
to achieve a higher spatial resolution. 

Morphological dataset construction. All taxa used in this study were personally 
observed by at least one of us, and more than half by two or more of the co-authors. 
The new dataset presented herein includes a large sample of species of squamates, 
as well as a broad variety of non-squamatan lepidosaurs and non-lepidosaurian 
diapsid species, representing all of the major clades of diapsid reptiles. Characters 
were assessed based on primary homology assessment and according to strict 
criteria for character construction, to avoid biases owing to logical or biological 
dependencies across characters, overweighting of any anatomical attributes and 
many other issues that may affect the morphological component of phylogenetic 
datasets'*. We selected Protorothyris archeri as the outgroup to our analyses and all 
morphological characters were treated as unordered (see Supplementary Methods 
for additional details). 

Molecular dataset alignment, model selection and partitions. The molecu- 
lar dataset consists of 16 genetic markers (13 nuclear and 3 mitochondrial loci) 
for 38 extant taxa. A complete list of sampled loci and sequence lengths is pro- 
vided in Supplementary Table 1. Sequence data for the selected coding regions 
were obtained from GenBank (Supplementary Data 2). For three ingroup taxa, 
Liolaemus signifer, Pristidactylus scapulatus and Stenocercus scapularis, for which 
molecular data were not available, we used sequences of the congeneric species, 
L. ornatus, P. torquatus and S. guentheri, respectively. Sequences were aligned in 
the MAFFT 7.245” online server using the global alignment strategy with iterative 
refinement and consistency scores. For the protein-coding genes, alignments were 
verified by translating nucleotide sequences to amino acids. The final multiple 
sequence alignment was concatenated and visually examined in Mesquite 3.04°°. 
Molecular sequences from all extant taxa were analysed for the best partitioning 
scheme and model of evolution using PartitionFinder2* under Akaike informa- 
tion criterion. 

Equal weights maximum parsimony analysis. Analyses were conducted in 
TNT v.1.1* using the new technology search algorithms. This strategy enables 
the sampling of trees from a broader spectrum of local optima than is allowed by 
the heuristic search with ratchet runs in PAUP* v.4.0 beta 10, especially for large 
datasets*>*°. Tree searches were conducted using 1,000 initial trees by random 
addition sequences with 100 iterations or rounds for each of the four NTS algo- 
rithms: sectorial search, ratchet, drift and tree fusing. The output trees were used 
as the starting trees for subsequent runs, using 1,000 iterations/rounds of each of 
the new technology search algorithms. The latter step was repeated once, and the 
final output trees were filtered for all the most parsimonious trees (MPTs). A total 
of 621 MPTs were obtained with 2,268 steps each. 

Implied weights maximum parsimony analysis. Analyses were also conducted 
in TNT, using the implied weighting algorithm?’, with a K= 12 and collapsing 
all branches with support = 0. Tree searches were conducted as performed for 
the equal weights parsimony analysis. Larger K values than the default (3.0) are 
indicated to perform better for large datasets**. A total of five best fit trees were 
obtained (fit = 91.768892) and used to calculate the strict consensus tree. 
Bayesian inference analyses. Analyses were conducted using Mr. Bayes v.3.2.6°° 
using the Cedar computer cluster made available through Compute Canada and 
the CIPRES Science Gateway v.3.3*°. Molecular partitions were analysed using 
the models of evolution obtained from PartitionFinder2 (see dataset), and the 
morphological partition was analysed with the MkV model*’. 

The distribution for rate heterogeneity was tested for best fit to the data under both 
~yand log-normal distributions, as it was recently demonstrated that a log-normal 
distribution may better fit morphological data for a large variety of datasets. 
Fit to the data was assessed using Bayes factors [B,o]**9 calculated with the mar- 
ginal model likelihoods obtained from the stepping-stone sampling method**. 
The interpretation of the results of the model fit to the data was performed as 
previously described*®: when 2log.(B) > 2 (positive evidence against model Mo); 


when 2log.(B) > 6 (strong evidence against model Mo); when 2log.(B) > 10 (very 
strong evidence against model Mo). However, 2log.(B) was less than one between 
the + and log-normal runs, indicating that there was no significant difference in fit 
to the morphological data between both distributions. The morphological partition 
was thus analysed under the 1 model for all subsequent analyses. 
Time-calibrated relaxed-clock Bayesian inference analyses. We implemented 
‘total-evidence-dating’ using the fossilized birth-death tree model with sampled 
ancestors, under a relaxed-clock model in Mr. Bayes v.3.2.6*”-*°. The chosen 
relaxed-clock model is the independent +) rate relaxed-clock model*®. This is a 
continuous uncorrelated relaxed-clock model using a gamma distribution to 
assess clock rate variation across lineages. The latter is compatible with the fos- 
silized birth-death tree model, unlike the compound Poisson process relaxed- 
clock model*’. The base clock rate was given an informative prior, which was 
derived from the non-clock Bayesian inference analysis: the median value for tree 
height in substitutions from the entire posterior trees sample divided by the age 
of the tree, which is based on the median of the distribution for the root prior: 
25.1658/325.45 = 0.0773, in natural log scale= —2.560061. We chose to use the 
exponent of the mean to provide a broad standard deviation (e°°”” = 1.080366) 
as previously recommended’. The sampling strategy was set to diversity, which 
is more appropriate when extant taxa are sampled in a manner that maximizes 
diversity (as performed herein) and fossils are sampled randomly*”*. Diversity 
sampling is very common in higher-level phylogenies, and not accounting for it has 
a deep effect on tree inference, pushing divergence times further back and creating 
unreasonably older and more variable divergence times**°!. This is a considerable 
advantage of using Mr. Bayes for divergence time estimates over current imple- 
mentations available in the software package BEAST™. 

The wealth of fossil taxa in our dataset, including some of the oldest known 
taxa for many clades, provided numerous calibration points. Therefore, the vast 
majority of our calibrations were based on tip dating, which accounts for the uncer- 
tainty in the placement of fossil taxa and avoids the issue of bound estimates for 
node-based age calibrations””. The fossil ages used for tip dating correspond to the 
uniform prior distributions on the age range of the stratigraphic occurrence of the 
fossils (available in Supplementary Table 2). However, it has recently been demon- 
strated that using tip dates only can contribute to unrealistically older divergence 
time estimates for some clades*>**. Therefore, when we lacked the oldest known 
fossils for any of the clades in our analysis (namely, captorhinids, choristoderes, 
snakes and rhynchocephalians), we used node-age calibrations with a soft lower 
bound as long as the age of the oldest known fossil was well-established and there 
was overwhelming support in the literature (and in all our other analyses) for their 
monophyletism. Combined with the diversity sampling strategy, the latter dating 
protocol can ensure reliable divergence time estimates. 

The age of the root was set with a soft lower bound, which gives a low (but 
non-zero) likelihood of the age being older than the lower bound value. Minimum 
and maximum root bounds were placed as follows. The minimum age was set at 
the oldest possible age for the oldest known reptile, Hylonomus (from the Joggins 
Formation in Nova Scotia, Canada), which comes from the late Bashkirian Stage 
(early Pennsylvanian, Late Carboniferous) and is between 318 and 315 Myr old®. 
Considering Petrolacosaurus may be as much as 307 Myr old, placing the mini- 
mum age at 318 Myr seems consistent, as the most recent common ancestor of 
diapsids and captorhinids must have been at least a few million years older than 
Petrolacosaurus. The maximum age was based on the maximum soft age for the 
reptile-synapsid split®®, 332.9 Ma. 

Convergence of independent runs was assessed using an average standard devi- 
ation of split frequencies of approximately 0.01, potential scale reduction factors 
of approximately 1 for all parameters*’ and an effective sample size greater than 
200 for each parameter. 

Leaf stability. Leaf stability was assessed using RogueNaRok”*, which allows 
assessing the difference between the highest and the second highest support values 
for alternative resolutions of each taxon quartet or triplet in the dataset (LSdif)°’. 
We applied this method to the posterior trees from the Bayesian inference analysis 
including both the morphological and molecular data. Because of the large number 
of taxa and large number of trees, it was necessary to downsample the total number 
of posterior trees from each analysis (100,000 trees after discarding burn-in). The 
final sample consisted of 10,000 trees (selecting one at every 10 trees) using the 
Burntrees script for Perl (https://github.com/nylander/Burntrees). Taxon names 
and raw data relating to each number depicted in Extended Data Fig. 9 can be 
found in Supplementary Table 3. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. The micro-CT scan data are available from the authors upon 
reasonable request. The morphological and molecular datasets for the phylogenetic 
analyses, including the Mr. Bayes parameters block, are available as Supplementary 
Information. 
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Extended Data Fig. 1 | Cranial anatomy of M. wachtleri (PZO 628) drawing of the skull in dorsal view. g, Reconstruction of the skull in dorsal 


based on personal examination and micro-CT scan data. a, Skull in view. h, Detailed view of right lateral side of the skull. i, Drawing of the 
dorsal view. b, Skull in posteroventral view. c, Skull in anteroventral view. view in h. San, surangular. Scale bars, 5 mm (ag). 


d, Skull in right ventrolateral view. e, Skull in left dorsal lateral view. f, Line 
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Extended Data Fig. 2 | Cranial and postcranial anatomy of M. wachtleri 
(PZO 628) based on personal examination and micro-CT scan data. 

a, Cross-section of the skull at the level of the frontals in anterior view. 

b, Details of the anterior end of the left dentary in occlusal view. c, Left 
quadrate. d, Whole body of the holotype as preserved in the slab (dorsal 
view). e, Anterior cervical vertebrae in left lateral view. f, Longitudinal 
section of the anterior cervicals in ventral view. g, Last cervicals and 
anterior dorsals in dorsal view. h, Pectoral girdle in ventral view. i, Pectoral 
girdle in left ventrolateral view. j, Right humerus in ventral view. k, Right 


LETTER 


expanded 
(compared tg other MC) 


manus in dorsal view. I, Line drawing of right manus in dorsal view. 
Ax.R., axis rib; Ce.PL, cervical, pleurocentrum; Co, cotyle; C.V.3, third 
cervical vertebra; dc2-5, distal carpals 2-5; DPC, deltopectoral crest; D.R., 
dorsal rib; D.T., dentary teeth; Epi.St., epiphysial suture; H.Epi., humeral 
epiphysis; i, intermedium; lc, lateral centrale; McI-V, metacarpals I-V; 
N.A., neural arch; Olf.Tr., olfactory tract; Po.Co., posterior cotyle; Qj.Fr., 
quadratojugal foramen; Qj.St., quadratojugal suture; r, radiale; Sbd.Sh., 
subdentary shelf; Sof.Pr., subolfactory processes; u, ulnare. Scale bars, 
1mm (a, b), 5mm (c, e-h, j-I), 10 mm (d, i). 
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Extended Data Fig. 3 | Equal weights maximum parsimony analysis, morphological data only. Strict consensus of 621 most parsimonious trees 
(2,268 steps each). Numbers at nodes indicate Bremer indices. 
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Extended Data Fig. 4 | Implied weighting maximum parsimony analysis, morphological data only. Strict consensus of the five best feet trees 


(fit = 91.768892). 
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Extended Data Fig. 5 | Bayesian inference analysis, morphological data only. Bayesian majority-rule consensus tree. Numbers at nodes indicate 


posterior probabilities. 
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Extended Data Fig. 6 | Bayesian inference analysis, combined morphological and molecular data. Bayesian majority-rule consensus tree. Numbers at 


nodes indicate posterior probabilities. 
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Extended Data Fig. 7 | Relaxed-clock Bayesian inference analysis with 
total-evidence tip dating using the fossilized birth-death tree model, 
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Extended Data Fig. 8 | Relaxed-clock Bayesian inference analysis with 
total-evidence tip and node dating using the fossilized birth-death 
tree model, combined morphological and molecular data. Bayesian 
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majority-rule consensus tree. Numbers at nodes indicate median estimates 
for the divergence times, and node bars indicate the 95% highest posterior 
density for divergence times. 
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Extended Data Fig. 9 | Taxon stability plotted against taxon 
completeness in the analysis combining both morphological and 
molecular data. a, Taxon stability in uncalibrated Bayesian inference 
analysis. b, Taxon stability in relaxed-clock Bayesian inference analysis 
with tip dating. Taxon stability increases directly proportional to taxon 
completeness. M. wachtleri (taxon 67, in red) has a stability slightly 


above average for uncalibrated Bayesian inference, and well above 
average for Bayesian inference with tip dating. All taxa are identified in 
Supplementary Table 3 (n = 129 taxa). Regression line in blue and 95% 
confidence interval in grey. Labels for extant taxa (~100% completeness) 
are omitted for simplicity. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


https://doi.org/10.1038/s41586-018-0138-7 


Long-term effects of species loss on community 
properties across contrasting ecosystems 


Paul Kardol", Nicolas Fanin)? & David A. Wardle)? 


Biodiversity loss can heavily affect the functioning of ecosystems, 
and improving our understanding of how ecosystems respond to 
biodiversity decline is one of the main challenges in ecology’. 
Several important aspects of the longer-term effects of biodiversity 
loss on ecosystems remain unresolved, including how these 
effects depend on environmental context>-’. Here we analyse data 
from an across-ecosystem biodiversity manipulation experiment 
that, to our knowledge, represents the world’s longest-running 
experiment of this type. This experiment has been set up on 30 lake 

islands in Sweden that vary considerably in productivity and 
soil fertility owing to differences in fire history*®. We tested the 
effects of environmental context on how plant species loss affected 
two fundamental community attributes—plant community 
biomass and temporal variability—over 20 years. In contrast to 
findings from artificially assembled communities’*”, we found 
that the effects of species loss on community biomass decreased 
over time; this decrease was strongest on the least productive 
and least fertile islands. Species loss generally also increased 
temporal variability, and these effects were greatest on the most 
productive and most fertile islands. Our findings highlight that the 
ecosystem-level consequences of biodiversity loss are not constant 
across ecosystems and that understanding and forecasting these 
consequences necessitates taking into account the overarching role 
of environmental context. 

Biodiversity experiments have previously shown that diverse com- 
munities are more efficient in capturing resources, and therefore 
produce more biomass than species-poor communities’. Some experi- 
ments have also shown that the effects of plant diversity on biomass pro- 
duction increase over time as complementarity of resource use among 
species increases!°"'”. This has led to suggestions that ecosystem-level 
consequences of biodiversity loss might be stronger than predicted 
from short-term experiments. However, the available evidence emerges 
from artificially and randomly assembled communities!”!*. How the 
effects of species loss develop over time in natural ecosystems and, 
importantly, how the long-term effects of biodiversity loss vary among 
ecosystems remain untested, despite growing evidence that the strength 
of relationships between plant diversity and productivity can vary 
with environmental conditions—notably soil resource availability". 
If soil resource availability is an important driver of plant diversity- 
productivity relationships, then variation in soil resources could have 
important consequences for how effects of species loss change over 
time and in the long term. 

The diversity of plant communities can also buffer temporal variabil- 
ity in response to external perturbations and fluctuations in environ- 
mental conditions'*!°. Greater temporal invariability (the consistency 
of biomass production over time or 1/(coefficient of variation (CV)); 
also referred to as ‘temporal stability’'*”'”) in more diverse communities 
is commonly ascribed to a greater temporal complementarity between 
species that results from a higher asynchrony of species responses to 
environmental fluctuations!*+!8. However, uncertainty remains as to 
whether there are generalizable relationships between plant diversity 


and temporal invariability in ecosystem functioning'®!®”°. Although 
recent evidence suggests that the addition of nitrogen can moderate 
the effects of species diversity on temporal invariability of community 
biomass”!, empirical tests of how these relationships vary with envi- 
ronmental context in natural ecosystems are scarce’. 

We examined the effects of environmental context on the effects of 
species loss over time, using data from an across-ecosystem biodiversity 
manipulation experiment set up in 1996 in a post-fire chronosequence 
that consists of 30 forested lake islands in northern Sweden”. Here, 
the main disturbance is wildfire: large islands burn more often than 
smaller ones, which creates a successional gradient across the islands. 
Large islands have greater soil fertility and greater supply rates of avail- 
able soil nutrients, and are more productive relative to smaller ones” 
(Extended Data Table 1). In line with previous work on this system®”?, 
we divided the islands into three size classes (large (>1.0 ha), medium 
(0.1-1.0 ha) and small (<0.1 ha)) of ten islands each®. For each island, 
plots were established comprising a full-factorial combination of 
three dwarf shrub species removals (removal of Vaccinium myrtillus, 
Vaccinium vitis-idaea and Empetrum hermaphroditum) (see Methods). 
These species account for >98% of vascular plant biomass in the under- 
story layer®. The ecosystems we studied have comparatively low plant 
alpha-diversity, as is characteristic of extensive areas of the boreal 
zone worldwide, and are therefore likely to be especially vulnerable 
to species loss. Our study design enables us to explore effects of all 
three-way combinations of the same species across widely contrasting 
environments. 

For each plot we took measurements of the biomass of each shrub 
species annually from 1996 to 2016 (see Methods); as expected, this 
revealed generally lower biomass in single- and two-species plots 
than in three-species plots. However, the effects of removal of certain 
species, or combinations of species, on total plant biomass strongly 
differed among island size classes (Fig. 1 and Extended Data Table 1). 
We predicted that the effects of species loss would be larger on smaller 
islands, owing to lower soil resource availability and greater resource 
partitioning among species”*. This would constrain the extent to which 
the remaining species could compensate for lost species. However, our 
data instead show that the effects of species loss across different-sized 
islands are strongly species-dependent (Fig. 1). The biomass of indi- 
vidual species was also strongly affected by plant species removal, with 
these effects also often varying with island size (Extended Data Fig. 1 
and Extended Data Table 1). These species-specific effects are partly 
explicable by different dominance patterns of the three shrub species 
among island size classes”, which are in turn associated with interspecific 
differences in nutrient acquisition strategies”. 

We then tested whether the strength of the effects of species loss 
on community biomass increases over time'®””, and if the magnitude 
of this increase is greater on small islands, as is expected if plants on 
small—and therefore unproductive—islands were to suffer most from 
a temporal decrease in resource complementarity. Artificially and 
randomly assembled biodiversity experiments have shown that comple- 
mentarity increases over time; for example, through increasing nitrogen 
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Fig. 1 | Effects of plant species removal on temporal plant biomass 
patterns. a—c, Temporal patterns (1999-2016; years 3-20) of total plant 
biomass (g per m7’) for large (a), medium (b) and small (c) islands are 
shown. Species codes refer to the plant species remaining (not removed): 
M, V. myrtillus; V, V. vitis-idaea; E, E. hermaphroditum. Thick dark- 
coloured lines show mean values per treatment (n= 10 islands per size 
class except for E treatments on large islands (n = 8), E treatments on 
medium islands (n=5), M + E treatments on medium islands (n = 8), 


retention in more diverse communities”’. We found that the effects 
of species removal on total plant biomass varied through time (Fig. 1 
and Extended Data Table 1), but against expectations the strength of 
the effects of species loss on plant biomass actually decreased. The 
decrease over time in the amount of variation explained by species rich- 
ness did not depend on island size (Fig. 2 and Extended Data Table 2). 
However, the amount of variation explained by removal treatments 
also decreased, and this was strongest for small islands (Extended Data 
Fig. 2 and Extended Data Table 2). Polynomial regressions revealed 
some nonlinearity in these temporal patterns (see Supplementary 
Information). To further examine the extent to which the effects of 
species loss depended on context, we also tested how commonly used 
quantitative measures of the effect of biodiversity on plant biomass 
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Fig. 2 | Effects of species richness on plant biomass decreases over time. 
Temporal patterns (1999-2016; years 3-20) of the proportion of variance 
explained in total plant biomass by realized species richness, for large, 
medium and small islands. The proportion of variance explained (also 
called the effect size) was calculated using marginal R* values (R?Guum(m)) 
for linear mixed models (n = 10 islands per size class). Effects of island 
size, year and their interactions were tested using linear mixed models, 
and contrasts were applied to compare slopes among island size classes for 
temporal changes in the amount of variance explained (See Extended Data 
Table 2 for details). Linear regressions were fitted for each island size class 
(dotted lines). P values denote slopes significantly different from zero. 
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E treatments on small islands (n= 9), M treatments on small islands 
(n=8) and M + E treatments on small islands (n = 9)). Thin light- 
coloured lines show values for individual plots. Within island size classes, 
removal treatments with the same letters are not significantly different 
across the duration of the study. Treatment effects were tested using linear 
mixed models fitted by a restricted maximum likelihood method, and we 
used contrast analyses to test across-year differences between removal 
treatments (see Methods for details). 


changed over time, and how these temporal changes varied with island 
size. These analyses indicated that the temporal decrease in the effects 
of species loss could largely be explained by changes in species com- 
plementarity (see Supplementary Information). 

Contrary to results obtained from highly controlled, randomly 
assembled communities!™!”, we show with long-term data derived 
from manipulations of low-diversity, naturally-assembled commu- 
nities that although biodiversity loss does significantly reduce plant 
biomass (Fig. 1 and Extended Data Table 1), this effect does not 
necessarily increase over time. This could occur through the mitigation 
of the effects of species loss by compensatory responses of remaining 
species”®, with this mitigation strengthening over time. Our findings 
highlight that such compensatory responses depend greatly on inter- 
actions between species identity and environmental context. This 
indicates that ecosystems are resilient to species loss if other species 
are able to occupy the newly available niche space. The shape of the 
resilience pattern of the boreal ecosystem we studied would be deter- 
mined by the growth responses of the remaining species, owing to 
the low diversity of this ecosystem. In other systems, colonization 
rates of species from external pools may be equally or even more 
important. One possible reason why previous experiments using 
controlled, randomly assembled communities have often found that 
the effects of biodiversity increase over time is that they constrain 
the niche widths of species and do not allow colonization by species 
from external pools. 

Finally, we tested whether species loss increases the temporal 
variance (or reduces 1/CV, invariability) in community biomass and 
whether this increase is strongest for small islands, on the basis that 
in low-productivity systems the species remaining after biodiversity 
loss would be less capable of maintaining complementary interactions 
under temporally fluctuating conditions’. Asynchronous responses 
among species are particularly important in maintaining biomass pro- 
duction over time, especially in low-productivity systems”!. In partial 
support of this, the temporal variability of total plant community 
biomass differed significantly among species removal treatments and 
these effects differed with island size, for the most-recent ten years 
of the experiment (Fig. 3a-c and Extended Data Table 3). Contrary 
to our expectation, we found that—relative to the three-species mix- 
tures (plots with no species removals)—temporal invariability (1/CV) 
was significantly less in at least three reduced-diversity treatments 
on medium and large islands, and only one one-species treatment 
on small islands. Across all removals, we also found significant 
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Fig. 3 | Effects of species removal and species richness on temporal 
biomass invariability. a-c, Effects of plant species removals on temporal 
invariability (1/CV) in community biomass (2007-2016; years 11-20) 

for large (a), medium (b) and small (c) islands. Species codes (M, E, V) 
refer to the plant species remaining (not removed). Bar graphs show mean 
t s.e. (n= 10 islands per size class, except for E treatments on large 
islands (n = 8), E treatments on medium islands (n=5), M + E treatments 
on medium islands (n = 8), E treatments on small islands (n = 9), M 
treatments on small islands (n = 8), and M + E treatments on small islands 
(n=9)). Dots indicate values for individual islands. Data were analysed 


positive relationships between realized species richness and temporal 
invariability in community biomass that were independent of island 


Realized species richness 


(2007-2016) for large 


size (Fig. 3d-f and Extended Data Table 3). When analysed for the 


initial years of our experiment, in which there was an overall down- 
ward trend in biomass (see Methods), or for the full experimental 
period, the effects of species removal on temporal variability were less 
pronounced (Supplementary Table 8 and Supplementary Figs. 11, 12). 
However, our findings support theoretical work”* that proposes that 
the negative effects of species loss on the consistency of biomass 
production over time depend strongly on which species are lost and the 
compensatory dynamics of remaining species over time, and highlight 
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that these effects are mediated by environmental context and associated 


differences in plant dominance. 


Biodiversity loss is known to have important consequences for the 
structure and productivity of plant communities and associated eco- 
system functions!~*, Most available empirical evidence is from studies 
conducted under highly controlled conditions in which plant diver- 
sity has been varied through random draws from species pools, but 
our results show that different patterns may occur when species are 
removed from natural ecosystems!°!?, Moreover, we show that the 
long-term effects of diversity loss are not consistent and can differ 
greatly across contrasting ecosystems. Although human impacts often 
cause the loss of biodiversity across contrasting types of ecosystems”, 
our results reveal that this loss of biodiversity affects the functioning 
of different ecosystems in contrasting ways. The loss of biodiversity in 
natural ecosystems under global change is not a random process, and 
we show that—for a type of low-diversity system that is globally wide- 
spread but little studied from a diversity-functioning perspective—it 
matters not only which species are lost and to what extent the remain- 
ing species are able to exploit the available resources, but also from 
which ecosystems they are lost. In acknowledging the need for testing 
the generality of our findings for other systems, including more-diverse 
systems that have been much more widely studied, we emphasize that 
forecasting the consequences of biodiversity loss for the functioning 
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using linear mixed models (see Methods for details). Within island size 
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different across years (Tukey’s post hoc test, P < 0.05). d-f, Relationship 
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METHODS 


No statistical methods were used to predetermine sample size. Sample size (that is, 
the numbers of islands used) was based on analysis of data from a previous study in 
this system using a larger number of islands*°. The experimental treatments were 
randomized, but investigators were not blinded to treatments during measurement 
because it would not have been tractable to do so. 

Study system. This study was conducted along a post-fire chronosequence con- 
sisting of 30 forested islands in lakes Hornavan and Uddjaure in northern Sweden 
(65° 55! N to 66° 09’ N; 17° 43’ Eto 17°55’ E). The islands are all of the same geologic 
age and experience the same macroclimate. Mean annual precipitation is 750 mm, 
of which 350 mm falls during the growing season (May to September), and the 
mean annual temperature is 0.5°C (13°C in July and —14°C in January). The only 
major extrinsic factor that varies among the islands is the frequency of wildfire, 
with larger islands burning more frequently than smaller islands because they have 
a greater probability of being struck by lightning**°. Previous work has shown 
that as islands become smaller and the time since fire increases, the availability of 
nutrients (notably nitrogen and phosphorus) decreases; this leads to impairment 
of the rates of decomposition and nutrient fluxes, and lower plant biomass and 
productivity®?2°->2, Consistent with previous studies in this system®2>°-*, these 
islands were divided into three size classes with 10 islands per class: large (>1.0 ha), 
medium (0.1-1.0 ha) and small (<0.1 ha), with a mean time since last major fire 
of about 585, 2,180 and 3,250 years at the start of the experiment, respectively’. 
This replicate number was necessary for correctly identifying important differ- 
ences among islands size classes*”. The overstory vegetation is dominated by Betula 
pubescens, Pinus sylvestris and Picea abies, and the ground-layer vegetation consists 
of the ericaceous dwarf shrubs V. myrtillus, V. vitis-idaea, and E. hermaphroditum, 
and feather mosses. Extended Data Table 4 shows a selection of ecosystem prop- 
erties for each of the three island size classes. 

Experimental design. For this study we focused on the dwarf shrub vegetation, 
and used a removal experiment approach because this is a powerful tool for inves- 
tigating the effects of local, non-random losses of biotic components and species 
interactions in natural ecosystems**. We established 8 experimental plots on each 
of the 30 islands, each representing a different dwarf shrub species removal treat- 
ment; that is, full factorial removal of V. myrtillus, V. vitis-idaea, and E. hermaph- 
roditum?. One of these plots involved removal of all shrub species and was not 
considered in this study. The level of species richness we manipulated is typical of 
the diversity of understory vegetation that characterizes boreal forests at the spatial 
scale of our plots. As such, our system is representative of low-diversity commu- 
nities for which ecosystem functioning is especially vulnerable to species loss. 
V. myrtillus, V. vitis-idaea and E. hermaphroditum dominate the ericaceous shrub 
layer in large, medium and small islands, respectively, and collectively account for 
+ 98% of vascular plant biomass in the understory® (see Supplementary Table 9 
for details on productivity and leaf traits for each of the shrub species across the 
island size gradient). All plots were 55 x 55 cm, but only the inner 45 x 45 cm 
was measured. All plots were located at similar distances from the shore for each 
island, regardless of island size, to prevent edge and microclimatic effects from 
confounding the results*”. The experiment was established in August 1996 and has 
been maintained annually ever since. Shrub removal treatments were conducted 
and maintained through annual physical removal of vegetation. We recognize that 
vegetation removals impose initial disturbance effects, but these are likely to be 
transient and of minimal importance after the initial years****. In a separate 14-year 
experiment performed on the dwarf shrub community on all 30 islands and in plots 
that are adjacent to the plots used in this study, and that involved experimental 
disturbance treatments with greater disturbance than the removals performed in 
this study*®, it was shown that disturbance legacies of vegetation removal are mostly 
gone within 3-6 years. Further details of the removal experiment reported here 
have previously been presented’. 

Every August from 1996 until 2016 inclusive, the total cover of shrub species 
was assessed in each plot by point quadrant analysis, by determining the total 
number of times the vegetation of that species was intercepted by a total of 100 
downwardly projecting points. The total number of intercepts for each species was 
then converted to biomass per unit area through equations previously developed 
by destructively sampling calibration plots®. For each of the three shrub species 
the total number of point intercepts is very closely correlated with aboveground 
standing biomass, with R? values that are consistently above 0.90°. During the 
first ten years of the experiment, we observed an overall downward trend in plant 
community biomass across island sizes. This trend was not related to the removal 
treatments; it also occurred in the control plots (Fig. 1). There are two plausible 
explanations. First, it could have been a function of where we positioned the plots 
when the experiment was set up in 1996, as we always placed our plots in well- 
developed vegetation. As part of their natural population dynamics, long-lived 
dwarf shrub species in boreal forest tend to move around over time*”. This means 
that well-developed shrub patches may well decline in biomass over time whereas 
less developed shrub patches may well aggrade to become well-developed patches 
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in the future. Second, over the course of the experiment plant biomass was likely 
to have turned over only about three times”, and if during one of the cycles there 
is high (or low) biomass, then this high (or low) biomass situation will persist 
for some years. It is plausible that if there were environmental conditions that 
promoted high biomass in all plots at the start of the experiment then that effect 
would have a legacy that would persist several years into the experiment. We 
emphasize that this downward trend is a non-biased pattern that has no bearing 
on conclusions that we can draw from this study. 

Data analyses. To account for the initial disturbance effects of the removal 
treatments’, the first three years of data (1996-1999) were excluded from the data 
analyses. Further, in all analyses we used only plots in which at least one of the three 
shrub species was present at the start of the experiment (that is, after implementa- 
tion of the removal treatments). This resulted in the a priori exclusion of 13 out of 
a total of 210 plots across all islands (large islands: 2 plots from which V. myrtillus 
and V. vitis-idaea were removed; medium islands: 5 plots from which V. myrtillus 
and V. vitis-idaea were removed and two plots where V. vitis-idaea was removed; 
small islands: 1 plot from which V. myrtillus and V. vitis-idaea were removed, 1 plot 
from which V. vitis-idaea was removed, and two plots from which V. vitis-idaea 
and E. hermaphroditum were removed). 

Effects of plant species removal, island size class and their interaction on 
total plant biomass and biomass of each of the three shrub species were tested 
using linear mixed models (LMMs) fitted by a restricted maximum likelihood 
method. Island identity was included as a random effect to account for the repeated 
measurements across time and plots were nested within island identity to enable 
comparison of treatments within each island separately. We fitted a first-order 
auto-regressive variance structure’!. To further explore the effects of species 
removal on total and plant- species-specific biomass, we ran LMMs separately 
for each island size class and used contrast analyses to test across-year differences 
between removal treatments. To evaluate potential effects of the overall down- 
ward trend in plant biomass during the first ten years of the experiment, we ran 
all LMMs separately for 1999-2006, 2007-2016 and 1999-2016. These separate 
analyses showed that effects of species removal and island size were largely con- 
sistent across these different stages of the experiment (see Extended Data Table 1). 

To test how the effects of species loss on plant community biomass changed over 
time, we calculated the proportion of variance explained by the species removal 
treatment and by species richness (to better account for species lost over the course 
of the experiment) separately for each year of the experiment (1999-2016) and for 
each island size class (small, medium and large). For each plot, the proportion of 
variance explained (also called effect size) was calculated using marginal R? values 
(R?cummcm)) for LMMs*’, with species removals or realized species richness (that 
is, the number of shrub species present across the entire experimental period*°) 
considered as fixed factors and island identity as a random factor. The values of the 
marginal R’¢iMm(m) describe the proportion of variance explained by fixed factors 
alone. We then tested how R’cumm(m) changed over time using LMMs with island 
size class and year as fixed factors and year as a continuous variable. Contrasts 
on the interaction between island size class and year were used to test whether 
the slopes of regressions between year and R’cumm(m) differed between island 
size classes. Finally, we used polynomial models to test for nonlinear patterns. 
In these models, we employed LMMs with island size class and the second-order 
polynomial of year as fixed factors and year as a continuous variable. To compare 
whether polynomial models varied among island size classes, we compared models 
by grouping island size classes two-by-two and tested whether the effect of the 
grouping variable was significant. 

To better understand the underlying mechanisms of the effects of species loss on 
total and plant-species-specific biomass and how they change over time, we used 
two approaches for comparing biomass in multi-species communities relative to 
their component monocultures”; in our study, monocultures consisted of the three 
treatments in which only one of the three species was not removed. These measures 
were calculated separately for three-species mixtures (that is, plots for which no 
removals were performed) and for each of the three possible two-species mixtures 
(that is, plots for which one of the three species was removed). First, we used the 
additive partitioning approach to separate net biodiversity effects into comple- 
mentarity and selection effects*'. Net biodiversity effects measure the deviation of 
biomass in any given multi-species community from its expected value based on 
biomass of its component species grown in monoculture. Complementarity effects 
measure the average relative species biomass in multi-species communities relative 
to the expected biomass based on the weighted-average biomass of their compo- 
nent monocultures. Selection effects measure the effects of species with higher or 
lower than average biomass in monocultures on total community biomass. Second, 
we calculated two measures of complementarity that are independent of absolute 
biomass values: relative yield totals and transgressive overyielding. The relative 
yield of a species measures its biomass in a multi-species community as a propor- 
tion of its biomass in monoculture, and the relative yield total of a multi-species 
community is the sum of the relative yield of all component species’. Transgressive 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


overyielding measures whether multi-species communities obtain higher biomass 
than the monoculture of its most productive component species. Transgressive 
overyielding was calculated as Dax following a previously published method®. 
We then tested how each of the measures of biodiversity effects that we calculated 
changed over time using LMMs and polynomial models as described above for 
R°cummin) and with individual islands (n = 10 for each of the three island size 
classes) used as units of replication. We fitted a first-order auto-regressive variance 
structure to account for temporal pseudoreplication through repeated measure- 
ments on each island!!. Measures of biodiversity effects could not be calculated 
whenever calculations included denominator values of zero, and there were also 
some cases (<1%) in which very low denominator biomasses resulted in extremely 
high values of biodiversity measures; these extreme values were removed from the 
analyses following the outlier labelling rule with a conservative tuning parameter 
of g= 2.2", 

Finally, to test how species loss affects temporal invariability in plant commu- 
nity biomass, we calculated the inverse coefficient of variation (1/CV); that is, the 
mean biomass over the study period divided by the standard deviation for that 
period’*""”, The inverse of coefficient of variation provides a widely used stand- 
ardized measure of invariability that is comparable across ecosystems, and that is 
commonly referred to as ‘temporal stability’. We focused on temporal invariability 
for the most-recent ten-year period, that is, 2007-2016. This is consistent with 
the ten-year period used in comparable analyses of temporal invariability!, and 
was done because up to 2007 there was a general decline in total plant community 
biomass, whereas over the last ten years of the experiment there was no apparent 
directional change in total community biomass (Fig. 1). However, we also calcu- 
lated temporal invariability for 1999-2006 and for 1999-2016. For each period, 
effects of plant species removal and island size class on temporal invariability were 
then tested using LMMs with island identity as random factor. Within island size 
classes, contrast analyses were used to test differences between species removal 
treatments. Finally, relationships between realized species richness and temporal 
invariability were tested using linear regression. Similar to the biomass data, 
all analyses for invariability were run separately for 1999-2006, 2007-2016 and 
1999-2016. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 


Data availability. The datasets generated during and/or analysed during the 
current study are available from the corresponding author on reasonable request. 
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Extended Data Fig. 1 | Effects of plant species removal on temporal 
biomass patterns (1999-2016, years 3-20) of individual species 
biomass. a-i, Data show individual species biomass (g per m’) for 

V. myrtillus (a-c), V. vitis-idaea (d-f) and E. hermaphroditum (g-i) for 
large, medium and small islands. Species codes (M, V, E) refer to the plant 
species remaining after removal. Thick dark-coloured lines show mean 
values per treatment (n = 10 islands per size class, except for E treatments 
on large islands (n = 8), E treatments on medium islands (n =5), 

M+ E treatments on medium islands (n= 8), E treatments on small 


islands (n= 9), M treatments on small islands (n= 8) and M+ E 
treatments on small islands (n= 9). Thin light-coloured lines show values 
for individual plots. Within island size classes, removal treatments with 
the same letters are not significantly different across years through the 
duration of the study. Treatment effects were tested using linear mixed 
models fitted by a restricted maximum likelihood method, and we 

used contrast analyses to test across-year differences between removal 
treatments (see Methods for details). 
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Extended Data Fig. 2 | Temporal patterns (1999-2016; years 3-20) class (dotted lines). We used linear mixed models to test how R’Gimmcn) 
of the proportion of variance in total plant biomass explained by the changed over time with island size class and year as fixed factors and year 
species-removal treatment for large, medium and small islands. The as a continuous variable. Contrasts on the interaction between island size 
proportion of variance explained (also called the effect size) was calculated _class and year were used to test if the slopes of regressions between year 
using marginal R* values (R’Gummm)) for linear mixed models (n = 10 and R’*cimm(m) differed between island size classes. Significant differences 
islands per size class). Linear regressions were fit for each island size in slopes among island size classes at a = 0.05 are indicated in the panel. 
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Extended Data Table 1 | Effects of species removal, island size and their interactions on total and species-specific plant biomass 


Period Source of Total biomass V. myrtillus V. vitis-idaea E. hermaphroditum 
variation df F p df F p df F p df al p 
1999-2016 Species removal (SR) 6,149 23.24 <0.001 3,75 16.35 <0.001 3,80 3.51 0.019 3,69 22.09 <0.001 
Island size (IS) 2,27 0.33 0.721 2,27 15.60 <0.001 2,27 4.00 0.030 2,27 20.04 <0.001 
SR x IS 12,149 6.10 <0.001 6,75 2.49 0.030 6,80 5.28  <0.001 6,69 1.71 0.133 
1999-2006 Species removal 6,149 23.55  <0.001 3,75 10.37  <0.001 3,80 2.85 0.043 3,69 20.93 <0.001 
Island size (IS) 2,27 0.69 0.509 2,27 15.83 <0.001 2,27 4.46 0.021 2,27 19.05 <0.001 
SR x IS 12,149 6.24 <0.001 6,75 1.85 0.102 6,80 3.95 0.002 6,69 1.73 0.128 
2007-2016 Species removal 6,149 14.60 <0.001 3,75 26.15 <0.001 3,80 4.19 0.008 3,69 17.54 <0.001 
Island size (IS) 2,27 0.52 0.598 2,27 13.60 <0.001 2,27 3.14 0.060 2,27 19.54 <0.001 
SR x IS 12,149 4.94 <0.001 6,75 3.06 0.001 6,80 5.45 <0.001 669 1.41 0.224 


Data were analysed using linear mixed models and were analysed for 1999-2016, as well as separately for 1999-2006 and 2007-2016 (n= 10 islands per size class) (see Methods for details). 
Significant P values (<0.05) are shown as bold numbers. Results from contrast analyses testing across-year differences in total and plant-species-specific biomass between removal treatments for 
large, medium and small islands are indicated in Fig. 1 and Extended Data Fig. 1 (1999-2016) and in Supplementary Table 10 (1999-2006 and 2007-2016). For the analyses of total biomass all 
seven species removal treatments were used; for each of the analyses for V. myrtillus, V. vitis-idaea and E. hermaphroditum, only the treatments from which those species were not removed were used. 
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Extended Data Table 2 | Effects of species removal, island size and 
their interactions on the amount of variance explained (R?qummcmn)) 
by species removal treatment and by realized species richness 


Amount of variance explained by 


species removal species richness 
Source of variation df F p F p 
Island size 2,48 3.33 0.044 0.54 0.589 
Year 1,48 107.0 <0.001 45.3 <0.001 


Island size x Year 2,48 3.28 0.046 0.52 0.596 


Slope comparisons z p z p 

Large vs. medium 1.66 0.2208 0.39 0.918 
Large vs. small -0.86 0.6667 -0.62 0.808 
Medium vs small -2.52 0.0317 -1.02 0.567 


The proportion of variance explained (also called the effect size) was calculated using marginal R? 
values (Rcummim)) for linear mixed models (n = 10 islands per size class). Data were analysed 
using linear mixed models, and contrasts were applied to compare slopes among island size 
classes for temporal changes in the amount of variance explained. Significant P values (<0.05) 
are shown as bold numbers. See Methods for model details. 
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Extended Data Table 3 | Effects of species removal or realized 
species richness, and island size, and their interactions on temporal 
invariability 


Source of variation df FE p 

A __ Species removal (SR) 6,148 6.10 <0.001 
Island size (IS) 2,27 1.23 0.309 
SR x IS 12,148 2.13 0.018 


B_ Realized species richness (R) 1,163 30.06 <0.001 
Island size (IS) 2,27 0.46 0.637 


Rx IS 2,163 0.12 0.885 


Data were analysed using linear models testing for the effects of species removal, island size 
(small, medium and large) and their interactions (A); and realized species richness, island 

size and their interactions on temporal invariability (1/(coefficient of variation or CV)) in plant 
community biomass (B). Data used are for the period 2007-2016 (n= 10 islands per size class). 
Significant P values (<0.05) are shown as bold numbers. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


Extended Data Table 4 | Selected ecosystem properties across the island size gradient 


Island size 
Ecosystem property Small Medium Large 
Time since last fire (years) 3250 + 439 a 2180 + 385 b 585 + 233 c 
Net primary productivity (g/m?/yr) 159+18b 247+12a 256+14a 
Standing plant biomass (g/m?) 3470 + 470 b 8340 + 877 a 9349 + 485 a 
Vascular plant species richness (number of 
species in a 10 m radius circular plot) eave Bee? cil 
Humus C to N ratio 32.9+0.79b 36.0 + 1.17 ab 40.441.18a 
Humus C to P ratio 759 +30a 687 + 36 ab 623 + 20 b 
Humus N to P ratio 23.341.1a 19.1+0.9b 15.4+0.5c¢ 
Mineral N (MIN) (ugN/g) 25.3 + 8.0 b 58.149.2a 38.2 + 14.4 ab 
Dissolved organic N (DON) (ugN/g) 40.3+4.6b 50.7+5.5a 39.147.2b 
MIN/(MIN+DON) 0.39 + 0.03 b 0.53+0.05a 0.49+0.04a 
Mineral P (ugP/g) 24.4+2.3b 37.7443 a 43.64+4.9a 
Membrane-extractable P (mmol/kg) 49+0.3b 65+04a 5.9 + 0.7 ab 
Light transmission (%) 68.6+26a 47.1+3.7b 55.8 + 4.5 ab 


Data shown are mean values + standard errors (n= 10 islands per size class). Within each row, differences between numbers followed by the same letter are not statistically significant at P=0.05 
(Tukey's test, following one-way ANOVA). Data are from previous studies®:22903145, 
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Reciprocal signalling by Notch-Collagen V-CALCR 
retains muscle stem cells in their niche 


Meryem B. Baghdadi!?, David Castel*+*, Léo Machado®, So-ichiro Fukada’, David E. Birk®, Frederic Relaix®, 


Shahragim Tajbakhsh!?* & Philippos Mourikis®* 


The cell microenvironment, which is critical for stem cell 
maintenance, contains both cellular and non-cellular components, 
including secreted growth factors and the extracellular matrix!>. 
Although Notch and other signalling pathways have previously been 
reported to regulate quiescence of stem cells**, the composition 
and source of molecules that maintain the stem cell niche remain 
largely unknown. Here we show that adult muscle satellite (stem) 
cells in mice produce extracellular matrix collagens to maintain 
quiescence in a cell-autonomous manner. Using chromatin 
immunoprecipitation followed by sequencing, we identified 
NOTCH1/RBPJ-bound regulatory elements adjacent to specific 
collagen genes, the expression of which is deregulated in Notch- 
mutant mice. Moreover, we show that Collagen V (COLV) produced 
by satellite cells is a critical component of the quiescent niche, as 
depletion of COLV by conditional deletion of the Col5a1 gene 
leads to anomalous cell cycle entry and gradual diminution of 
the stem cell pool. Notably, the interaction of COLV with satellite 
cells is mediated by the Calcitonin receptor, for which COLV acts 
as a surrogate local ligand. Systemic administration of a calcitonin 
derivative is sufficient to rescue the quiescence and self-renewal 
defects found in COLV-null satellite cells. This study reveals a 
Notch-COLV-Calcitonin receptor signalling cascade that maintains 
satellite cells in a quiescent state in a cell-autonomous fashion, and 
raises the possibility that similar reciprocal mechanisms act in 
diverse stem cell populations. 

Notch activation antagonizes myogenesis by induction of transcrip- 
tional repressors (members of the HES/HEY family) and sequestration 
of the co-activator Mastermind-like 1 from the muscle differentiation 
factor MEF2C'!', However, Notch signalling has broader functions 
in muscle cells, including the maintenance of quiescence*’. To explore 
these functions, we carried out chromatin immunoprecipitation follow- 
ing by sequencing (ChIP-seq) screening!” and observed that intracel- 
lular Notch (NICD) and its downstream effector RBPJ occupied and 
regulated enhancers proximal to the collagen genes Col5a1, Col5a3, 
Col6a1 and Col6a2, which code for collagens that are amongst the most 
abundant of those produced by satellite cells (Fig. 1a, b and Extended 
Data Fig. la—e). By analysing mouse genetic models with altered Notch 
activity, we showed that the expression of these collagens tightly corre- 
lated with Notch activity in vivo (Extended Data Fig. 2a—e). Moreover, 
transcriptional induction of Col5a1 and Col5a3 by NICD translated to 
elevated COLV protein levels, specifically the al(V)a2(V)a3(V) iso- 
form (a3-COLV), in fetal forelimb (Fig. 1c) and adult hindlimb (tibialis 
anterior muscle) myogenic cells (Fig. 1d and Extended Data Fig. 2f 
for «03-COLV antibody specificity). Furthermore, we isolated collagen- 
depleted myofibres after treatment with collagenase, to monitor de 
novo a3-COLV production. As Col5a1 and Col5a3 transcripts are 
downregulated upon exit from quiescence (Extended Data Figs. 1a, 2g), 
no a3-COLV was detected in freshly isolated or activated satellite cells. 


Instead, genetic overexpression of NICD resulted in abundant, newly 
synthesized a3-COLV (Fig. le, f). 

To assess the functional role of COLYV, isolated satellite cells were 
incubated with COLI, COLV or COLVI in the presence of 5-ethynyl- 
2'-deoxyuridine (EdU) to assess proliferation and stained for PAX7, 
which marks muscle stem and/or progenitor cells, and the muscle com- 
mitment (MYOD) and differentiation (Myogenin) proteins. Only the 
COLV-complemented medium delayed entry of quiescent cells into the 
cell cycle (32 h, Fig. 2a), and consequently delayed their amplification 
and differentiation (72 h, Fig. 2b; 10 days, Extended Data Fig. 3a-c). 
As previously shown*!3, Rbpj-/~ cells underwent precocious differen- 
tiation and this was partially antagonized by COLYV, consistent with the 
finding that Col5al and Col5a3 genes are targets of NICD and RBPJ 
(Fig. 2c, dand Extended Data Fig. 3d—g). Taken together, these results 
show that COLV, specifically, sustains primary muscle cells in a more 
stem-like PAX7* state, indicating that COLV could potentially have a 
role in maintaining the quiescent niche. 

To determine whether COLV produced by satellite cells is a func- 
tional component of the niche, we generated compound Tg:Pax7- 
CreERT2;Col5a Lfx.R26"1G (hereafter referred to as ‘Col5al cKO’) 
mice, in which COLV was depleted and simultaneously lineage-traced 
in GFP* satellite cells»’4 (Fig. 3a and Extended Data Fig. 4a). Because 
the al-chain of COLV is present in all COLV isoforms (which are 
trimeric), Col5a1 deletion produces cells completely lacking COLV 
protein’. Unexpectedly—given the general stability of collagens— 
targeted deletion of Col5a1 resulted in upregulation of the differen- 
tiation marker genes Myod (also known as Myod1) and Myog, and 
a concomitant reduction of the quiescence marker Calcr, as well as 
Pax7, only 18 days after tamoxifen treatment (Fig. 3b). Mutant cells 
also showed ectopic expression of Myogenin (Fig. 3c), increased 
5-bromo-2’-deoxyuridine (BrdU) incorporation (Fig. 3d) and showed 
a significant decline in PAX7* satellite cells (Fig. 3e). The Col5al cKO 
cells did not undergo apoptosis (data not shown), but fused to give 
rise to GFP-marked myofibres (Fig. 3f). Therefore, blocking de novo 
synthesis of COLV resulted in the spontaneous exit of satellite cells 
from quiescence, and differentiation, a phenotype reminiscent of 
Notch loss-of-function*”. 

To investigate the role of Col5a1 in regeneration, we examined the 
morphology of tibialis anterior muscles of Col5al cKO mice, 18 days 
after cardiotoxin-mediated injury (Fig. 3a). Notably, mutant myo- 
genic cells produced smaller nascent myofibres compared to control 
cells (Fig. 3g, h). Unexpectedly, fewer self-renewing PAX7* cells were 
observed in the Col5a1 cKO mice (Fig. 3i) in spite of abundant COLV 
in regenerating muscle (data not shown), probably produced by the 
resident fibroblasts, suggesting a cell-autonomous role for Col5a1. To 
investigate self-renewal in a more tractable system, we targeted COLV 
using short interfering RNA (siRNA) on isolated myofibres in culture 
in which satellite cells proliferate and self-renew on the myofibre. 
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Fig. 1 | NICD and RBP] regulate transcription 
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Consistent with our in vivo observations, Col5al knockdown by 
siRNAs resulted in a marked decrease in the number of the self- 
renewing PAX7*MYOD* cells, compared to scramble control cells 
(Extended Data Fig. 4b, c). Of note, Col5a3 siRNA phenocopied 
Col5al siRNA, which demonstrates that the active triple helix contains 
a3-COLV (Extended Data Fig. 4c). 
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Fig. 2 | COLV delays proliferation and differentiation of satellite cells. 
a, EdU pulse (for 2 h) of freshly isolated satellite cells cultured for 32 h: 
COLI (35%), COLVI (34%) and COLV (18%) (n=4 mice, >250 cells, 

2 wells per condition). Tg:Pax7-nGFP mice express nuclear (n)GFP 
driven by Pax7 regulatory elements. b, Immunostaining of freshly isolated 
satellite cells cultured for 72 h. PAX7: 58%, 55% and 81%; Myogenin: 56%, 
57% and 24% for COLI, COLVI and COLY, respectively (n = 4 mice, >250 
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NICD mice*®. d, Anti-GEP (satellite cells) and 
anti-a3-COLV immunostaining on transverse 
sections of quiescent adult tibialis anterior 
muscles expressing NICD-IRES-GFP (Pax7©7?- 
NICD). All GFP* cells overexpressed COL5A3 
(50 cells per mouse, n = 3 mice). e, Freshly fixed 
single myofibres from Pax7°'?-NICD extensor 
digitorum longus muscles at 0 h (left) or after 
24h in culture (right), stained for GFP and a3- 
COLYV. f, Vertical and horizontal optical sections 
of myofibre presented in e from Pax7©!?- 

NICD mice (24h in culture) showing COLV 
surrounding NICD-GFP* satellite cells. Scale 
bars, 50 jum (c) and 10 pm (d-f). In c, d, insets 
are shown at 2x magnification of main panels. 
NS, not significant. 


Substrate rigidity and geometry have previously been demonstrated 
to control stem cell properties, including differentiation and self- 
renewal!>!6, However, we observed that COLV interacted with myo- 
genic cells only when added in the medium, and not when present as 
a coating substrate (data not shown), which led us to speculate that it 
acted as a signalling molecule rather than a biomechanical modulator. 
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cells, 2 wells per condition). c, Experimental scheme for satellite cells 
plated overnight before collagen treatment. cKO, conditional knockout. 
d, Immunostainings of freshly isolated satellite cells incubated with 
collagens for 60 h (n =3 mice, >200 cells, 2 wells per condition). 
Percentage (%) is presented over total GFP* cells. Data are mean + s.d.; 
two-sided paired t-test; *, P value calculated by two-sided unpaired t-test. 
Scale bars, 50 1m. 
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Fig. 3 | Satellite-cell-produced COLV is required in vivo for self-renewal 
and maintenance of quiescence. a, Experimental schemes for control 
(Tg:Pax7-CT2;Col5a1*'+;R26"™°), heterozygous (Tg:Pax7-CT2; 

Col5a i+ ;R26"™®) and conditional knockout (Tg:Pax7-C T2;Cold5a tfox/flox, 
R26") mice. TA, tibialis anterior muscle. b, RT-qPCR of satellite cell 
(Pax7, Calcr) and differentiation (Myod, Myog) markers on Col5a1~'~ and 
control satellite cells isolated by fluorescence-activated cell sorting from 
resting muscle (n = 3 mice per genotype). Ctr, control; Het, heterozygous; 
cKO, conditional knockout. c, Representative images of membrane-bound 
GFP* (mGFP) satellite cells from total muscle preparations from control 
and Col5a1-null mice plated for 12 h. Arrow, mGFP* Myogenin* cell 

(n= 3 mice per genotype, >200 cells). d, mGFP* satellite cells from total 
muscle preparations plated for 12 h. Asterisk, non-recombined BrdUT 

cell; arrows, mGFP*BrdUt cells (n =3 mice per genotype, >250 cells). 

e, Satellite cell quantification in quiescent tibialis anterior muscles (seven 


To identify the cell surface receptor of COLV on satellite cells, we used 
a myotube-formation assay (see Extended Data Fig. 3b), coupled to 
inhibitors against known collagen receptors, including Integrins and 
the RTK receptor DDR1!”!8 but these did not obstruct the anti- 
myogenic activity of COLV (Extended Data Fig. 5a). Because collagens 
have also previously been shown to bind G-protein coupled receptors 
(GPCRs)!°, we focused on Calcitonin receptor (CALCR), which is 
a GPCR critical for the maintenance of satellite cells”’. Only cells that 
expressed CALCR showed decreased proliferation in the presence of 
COLV (Extended Data Fig. 5b), and Calcr~/~ satellite cells isolated from 
conditional knockout Pax7@???"; Calcrf“fo* mice failed to respond 
to COLV treatment (Fig. 4a and Extended Data Fig. 5c—e), demon- 
strating that CALCR constitutes an essential mediator of the COLV 
signal (Extended Data Fig. 4e). Accordingly, as CALCR is rapidly 
cleared after satellite cell activation”!, COLV had no effect on cultured 
myogenic cells that had been activated in vivo (three days after injury; 
Extended Data Fig. 5f). However, we note that addition of COLV on 
freshly isolated satellite cells appeared to stabilize residual CALCR and 
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weeks after tamoxifen treatment) in control, heterozygous and Col5a1 
cKO mice (n=3 (control) and 4 (heterozygous and cKO) tibialis anterior 
muscles per genotype). f, Immunostaining of sections from control 

and Col5a1 cKO tibialis anterior muscles seven weeks after tamoxifen 
treatment (n =3 mice per genotype). g, Immunostaining of sections from 
control and Col5a1 cKO tibialis anterior muscles 21 days after injury (n= 3 
mice per genotype). h, Muscle cross-sectional area distribution 21 days 
after injury (shown as violin plots) was significantly different in control 
versus Col5a1 cKO tibialis anterior muscles, based on Kruskal-Wallis 
test (1 =3 mice per genotype, 1,000 fibres analysed per mouse). 

i, Immunostaining of sections 18 days after cardiotoxin injury of control 
and Col5a1 cKO tibialis anterior muscles (n =3 mice per genotype). 
Percentage (%) is presented over total GFP* cells. Data are mean + s.d.; 
two-sided unpaired t-test. Scale bars, 50 {1m (c, d) and 100 jm (f, g, i). 


retain Calcr gene expression, thus allowing their prolonged interaction 
(Extended Data Fig. 5g-i). In summary, we show that CALCR is a crit- 
ical mediator of the effect of COLV on maintaining quiescence and on 
the stemness properties of satellite cells. 

To date, it has been assumed that CALCR in satellite cells is acti- 
vated by circulating calcitonin peptide hormones, which are princi- 
pally expressed by parafollicular thyroid cells; this points to systemic 
regulation of stem cell quiescence. Based on our findings, we reasoned 
that COLV serves as a local ligand for the CALCR receptor. Indeed, 
on-cell enzyme-linked immunosorbent assay experiments showed 
that COLV—but not COLI—selectively bound to cells expressing 
CALCR (Fig. 4b). Notably, this binding was functional as COLV—but 
not COLI—displayed rapid activation kinetics and upregulation of lev- 
els of intracellular cAMP, which is a downstream reporter of CALCR 
activation”* (Fig. 4c, d and Extended Data Fig. 6a). In vitro binding 
assays using the extracellular domain of CALCR did not result in robust 
interaction with COLV (data not shown). Therefore, we propose that 
binding of COLV to CALCR requires a specific configuration of the 
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Fig. 4 | Interaction of COLV with satellite cells is mediated by CALCR. 
a, Control (Ctr; Pax7©??/+;Calcrt!*;R26%°P-¥¥?) and Calcr-deficient 
(Calcr cKO; Pax7©?” +5 Calcrfox/flox, R26st0p-YEP ) satellite cells incubated 

for 10 days with COLI or COLV and immunostained for differentiation 
(n=3 mice, >250 cells). b, Binding assay of COLV and CALCR by 
colorimetric on-cell enzyme-linked immunosorbent assay based on the 
measurements of horseradish peroxide absorbance. Runs test P value 

< 0.0001. Results presented as ratio of absorbance over non-treated cells 
(NT, orange line = 1) at 20 min of horseradish peroxide development. 

c, cAMP measurements of Calcr-transduced C2C12 cells after three 
hours of treatment with acetic acid (HOAc), COLI, COLV or elcatonin. 
Graph represents fold cAMP induction over average of mock cells 
treated with HOAc (n=4 assays). d, Dose-response curve of fold cAMP 
concentration in Calcr-transduced C2C12 cells treated for 3 h with 
increasing concentrations of COLV. Half-maximal effective concentration 
(ECs9) = 25.05 jug ml~! (n=4 independent assays). e, Experimental 


receptor, possibly involving the extracellular loops or co-factors. Taken 
together, these data demonstrate that COLV physically and functionally 
interacts with CALCR. 

In this study, we showed that blocking COLV production from sat- 
ellite cells resulted in rupture of quiescence and impaired self-renewal 
in vivo. Combined with our ex vivo results, the similarity of these 
phenotypes to Notch and CALCR signalling abrogation points to a 
cell-autonomous Notch-COLV-CALCR axis that sustains muscle stem 
cells in their niche. Consistent with this notion, administration of the 
CALCR ligand elcatonin to control and Col5a1-null mice resulted in 
upregulation of the stem cell markers Pax7 and Calcr, indicating that 
the injected ligand was readily delivered to the quiescent satellite cells 
(Fig. 4e, f). Notably, elcatonin mitigated the precocious Myog transcrip- 
tion and protein expression levels in Col5a1 mutant cells (Fig. 4f, g). 
Elcatonin also prolonged the GO-to-S transition of control satellite cells 
exiting quiescence (Fig. 4h), which suggests that hyperactivation of 
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scheme of tamoxifen and elcatonin administration to Col5al cKO mice 
and their corresponding control mice. f, RT-qPCR of satellite cells (Pax7, 
Calcr) and differentiation (Myog) markers on Col5a1 cKO mutant mice 
and control mice (1 =3 mice per condition) treated with elcatonin or 
saline. g, Representative images of mGFP* satellite cells from total muscle 
preparations from Col5a1-null mice injected with saline or elcatonin, 
plated for 12 h. Arrows, mGFP*MYOG* cells (n= 3 mice per condition, 
>200 cells). h, EdU (for 2h) and mGFP staining of satellite cells from total 
muscle preparations from control mice treated with saline or elcatonin, 
plated for 36 h. Asterisk, mGFP~ EdU* cell (n =3 mice per genotype, >400 
cells). i, Experimental scheme of tamoxifen and elcatonin administration 
to control and Col5al cKO mice. j, PAX7* cells on tibialis anterior sections 
21 days after injury in mice treated with saline or elcatonin (n = 6 mice for 
control and 8 mice for cKO, per treatment). Percentage (%) is presented 
over total GFP* cells. Data are mean + s.d.; a-c, two-sided paired t-test; 
f-j, two-sided unpaired t-test. Scale bars, 25 jum. 


CALCR could drive cells into a deeper, more dormant-like quiescent 
state marked by higher Pax7 expression”*, Therefore, CALCR activ- 
ity appears to control quiescence quantitatively, shown by the loss of 
satellite cells in the absence of ligand COLYV, and qualitatively, shown 
by the presence of dormant-like satellite cells upon hyperactivation. 
Elcatonin restored the number of PAX7* satellite cells in regenerating 
Col5a1 cKO muscles to wild-type levels (Fig. 4i, j), and in an ex vivo 
self-renewal reserve-cell model (Extended Data Fig. 6b, c). Therefore, 
we show that endogenous calcitonin levels are not sufficient to main- 
tain Col5a1-null satellite cells, and that exogenous administration of 
a calcitonin derivative rescued the defects, probably via the activation 
of CALCR. 

Here we describe a self-sustained signalling cascade orchestrated by 
the Notch pathway and propagated by the extracellular matrix of the 
immediate skeletal muscle stem cell niche (Extended Data Fig. 7). We 
propose that Notch acts as a sensor of the homeostatic environment 
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by reinforcing the niche with active COLV that provides cell- 
autonomous signals and maintains stem cell quiescence. Upon 
disruption of the niche and physical separation of the ligands, Notch 
signalling is sharply downregulated and stem cells exit quiescence*™*. 
This halts further production of COLV and thus favours satellite cell 
activation, as shown in our model (Extended Data Fig. 7). It would 
be of interest to investigate whether the Notch-COLV-CALCR 
signalling cascade described here applies to stem cells in other tissues 
and organisms, in which an extracellular matrix protein produced by 
the stem cell can act as a local ligand for cell-autonomous stability 
of the niche through a GPCR. The regulatory mechanism that we 
identify provides a framework to construct a more complete view 
of the stem cell niche, and to manipulate stem cell behaviour in a 
therapeutic context. 
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METHODS 


Mouse strains. Mouse lines used in this study have been described and provided 
by the corresponding laboratories: Myf5“ mice”’, Pax7“"=®!? mice”® (used to 
recombine R26"? NCP allele), R26%P NICD-GEP mice?®, R26"!"S mice®” (ROSA 
26 gene trap with membrane-Tomato floxed/membrane-GFP), Rbpj"*"* mice*!, 
Pax7©!*+ ;Calcr!™/Sox, R26°P-YFP/stop-YFP rice?! (triple mutant mice provided by 
S.F) and Col5a 1“ mice**. Tg:Pax7-CreERT2 (used to recombine Rbpj and 
Col5a1) and Tg:Pax7-nGEFP lines have previously been described**?, All adult mice 
analysed were between 8 and 12 weeks old. Animals were handled according to 
national and European community guidelines, and protocols were approved by the 
ethics committee at Institut Pasteur and the French Ministry. 

Muscle injury, tamoxifen, BrdU and elcatonin administration. For muscle 
injury, Tg:Pax7-CreERT2;Col5a Ux, Ra6mtmG mice and their correspond- 
ing controls were anaesthetized with 0.5% Imalgene/2% Rompun and the tib- 
ialis anterior muscle was injected with 50 iil of cardiotoxin (10 |1M; Latoxan). 
Tg:Pax7-CreERT2;Rbpj""*,R26""™© mice and their corresponding controls were 
injected intraperitoneally with tamoxifen three times (250 to 300 \1l, 20mg/ml; 
Sigma T5648; diluted in sunflower seed oil/5% ethanol). Pax7reERT2, Cal cpfox/flor, 
R26%P-YFP mice and their corresponding controls were injected intraperitoneally 
with tamoxifen twice (1 mg per 5 g of body weight) and euthanized two weeks 
later. Pax7ERT2, R26%0P-NICD-ares- GFP and Tg:Pax7-CreERT2;Col5a lox, R2gmtnG 
mice and their corresponding controls were fed a diet containing tamoxifen for 
one and two weeks, respectively (Envigo, TD55125). Six days before being euth- 
anized, Tg:Pax7-CreERT2;Col5a lef, R2ag6minG mice and their corresponding 
controls were given the thymidine analogue BrdU (0.5 mg/ml, #B5002; Sigma) 
in the drinking water supplemented with sucrose (25 mg/ml). Elcatonin (2.5 ng 
per g of mouse, final concentration in 0.9% NaCl; Mybiosource, MBS143228) was 
injected subcutaneously eight times, every other day. Comparisons were done 
between age-matched littermates using 8-12-week-old mice. 

Muscle enzymatic dissociation and stem cell isolation. Adult and fetal limb 
muscles were dissected, minced and incubated with a mix of Dispase II (Roche, 
04942078001) 3 U/ml, collagenase A (Roche, 11088793001) 100 jig/ml and DNase 
I (Roche, 11284932001) 10 mg/ml in Hank’s Balanced Salt Solution (Gibco) sup- 
plemented with 1% penicillin—-streptomycin (PS; Gibco) at 37°C at 60 rp.m. ina 
shaking water bath for 2 h. The muscle suspension was successively filtered through 
100-j.m and 70-|1m cell strainers (Miltenyi, 130-098-463 and 130-098-462) and 
then spun at 50g for 10 min at 4°C to remove large tissue fragments. The superna- 
tant was collected and washed twice by centrifugation at 600g for 15 min at 4°C. 
Before fluorescence-activated cell sorting (FACS), the final pellet was resuspended 
in cold Dulbecco's modified Eagle’s medium (DMEM) and 1% PS supplemented 
with 2% fetal bovine serum (FBS), and the cell suspension was filtered through 
a 40-11m strainer. Satellite cells were sorted with Aria III (BD Biosciences) using 
either the GFP (Tg:Pax7-nGFP or Tg:Pax-CreERT2;Rbpj"“*;R26"™S or Tg:Pax7- 
CreERT2;Col5a U™*,R26"™®) or the YFP (Pax7-™;Calcr**,R26%P YFP) cell 
markers. Isolated, mononuclear cells were collected in DMEM/1% PS/2% FBS. 
Enzymatically dissociated muscle was also plated directly without FACS on 
Matrigel-coated dishes (Corning, 354248; 30 min at 37°C), and fixed 12 h later 
with 4% paraformaldehyde (PFA)/PBS. Cells were immunostained following the 
protocol described above. 

Chromatin immunoprecipitation. Cultured myoblasts. Satellite cells were iso- 
lated from adult Tg:Pax7-nGFP mice and plated on dishes, coated with Delta-like 
1, for 72 h to maintain active Notch signalling, as previously described***. Cells 
were then processed for ChIP using a dual cross-linking protocol”, with slight 
modifications. In brief, cells were fixed on the dish with 2 mM di(N-succinimidyl) 
glutarate (Sigma, 80424) in PBS for 45 min at room temperature. After two washes 
with PBS, cells were re-fixed with 1% formaldehyde/PBS for 10 min at room tem- 
perature, before quenching the reaction with 1/20 volume of 2.5 M glycine for 
5 min at room temperature. The cells were then collected with a cell scraper in 
PBS supplemented with 1% BSA and protease inhibitors (Roche, 11697498001), 
and collected by spinning. Cell lysis and chromatin isolation were done using the 
Ideal ChIP-seq kit for histones (Diagenode, C01010051). Chromatin was sheared 
using a Bioruptor Pico (Diagenode B01060001) with 10 cycles of 30 s on/off son- 
ication. The samples were prepared in triplicates from different plates. Primary 
myogenic cells (2 x 10°) were used per ChIP and 2 x 104 cells were used per 
input. The immunoprecipitations were performed following the manufacturer's 
guidelines using 6 \1l of anti-RBPJ antibody (Cell Signalling, #5313) or 1.5 l of 
rabbit control IgG antibody (Diagenode, C15410206) in a final volume of 300 iil 
per ChIP. The purification of the immunoprecipitated DNA was performed using 
DiaPure columns (Diagenode, C03040001). RT-qPCR was performed using 
FastStart Universal SYBR Green Master mix (Roche, 04913914001) and analysis 
was performed using the 2-44“ method** normalized to the Neg16 region. 
Quiescent satellite cells. Satellite cells were isolated from adult Tg:Pax7-nGFP 
mice using in situ fixation to preserve Notch signalling from dissociation- 
induced downregulation”. Cells were fixed as above in 2 mM di(N-succinimidyl) 
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glutarate for 45 min, followed by 10 min with 1% formaldehyde at room tempera- 
ture. Cell lysis and chromatin isolation were performed using Auto-TrueMicrochip 
kit (Diagenode, C01010140). Chromatin was sheared as above with 10 cycles of 30s 
on/off sonication using a Bioruptor Pico. Two hundred thousand cells were used 
per ChIP and 2 x 103 per input and IPs were performed using 2 il of anti-RBPJ 
antibody (Cell Signalling, 5313) or 0.5 1l of rabbit control IgG antibody following 
the manufacturer’s guidelines. Immunoprecipitated chromatin preparations and 
input were purified using the Auto [Pure kit v2 (Diagenode). RT-qPCR was per- 
formed using FastStart Universal SYBR Green Master mix (Roche, 04913914001) 
and analysis was performed using the 2-44“ method** normalized to the Neg16 
region. Primers used for ChIP-qPCR are listed in Supplementary Table 1. 

Cell culture and collagen incubation. Satellite cells isolated by FACS were plated at 
3 x 10° cells per cm? on ibi-treated j1-slides (Ibidi, 80826) pre-coated with 0.1% gel- 
atin for 2 h at 37°C. Cells were cultured in satellite cell growth medium containing 
DMEM (Gibco) supplemented with F12 (50:50; Gibco), 1% PS, 20% FBS (Gibco) 
and 2% Ultroser (Pall; 15950-017) at 37°C, 3% O2, 5% CO; for the indicated time. 
Twelve hours after plating, collagens (COLI rat tail, BD Biosciences, 354236; COLV 
human placenta, Sigma, C3657; COLVI human placenta, AbD Serotec 2150-0230) 
resuspended in HOAc acid at 1 mg/ml, were added to the culture medium at a 
final concentration of 50 1g/ml and cells were fixed with 4% PFA for 10 min at 
room temperature. To assess proliferation, cells were pulsed with the thymidine 
analogue EdU, 1 x 10~° Mat 2h before fixation (ThermoFisher Click-iT Plus EdU 
kit, C10640). Inhibitors used: Obtustatin (Integrin «161, Tocris, 4664, 100 nM), 
TC-I15 (Integrin 281 Tocris, 4527, 100 1M), RGDS peptide (all Integrins, Tocris, 
3498, 100 j.M), 7rh*” (DDRI1, a gift from K. Ding, 20 nM). 

Muscle fixation and histological analysis. Embryo forelimbs were fixed in 
4% PFA/0.1% Triton for 2 h, washed overnight with 1x PBS, immersed in 
20% sucrose/PBS overnight, embedded in OCT, frozen in liquid nitrogen and 
sectioned transversely at 12-14 tum. Isolated tibialis anterior muscles were imme- 
diately frozen in liquid-nitrogen-cooled isopentane and sectioned transversely 
at 8 jm. For PAX7 staining on adult tibialis anterior muscle, sections were post- 
fixed with 4%PFA, 15 min at room temperature. After 3 washes with 1x PBS, 
antigen retrieval was performed by incubating sections in boiling 10 mM citrate 
buffer pH 6 for 10 min. Sections were then blocked, permeabilized and incubated 
with primary and secondary antibodies as described in Immunostaining on cells, 
sections and myofibres. 

Single myofibre isolation and siRNA transfection. Single myofibres were 
isolated from extensor digitorum longus muscles following the previously 
described protocol**. In brief, extensor digitorum longus muscles were dissected 
and incubated in 0.1% w/v collagenase (Sigma, C0130)/DMEM for 1 h ina 37°C 
shaking water bath at 40 rp.m. Following enzymatic digestion, mechanical disso- 
ciation was performed to release individual myofibres that were then transferred to 
serum-coated Petri dishes. Single myofibres were transfected with Col5a1 siRNA, 
Col5a3 siRNA (Dharmacon SMARTpool Col5al (12831) L-044167-01 and Col5a3 
(53867) L-048934-01-0005) or scramble siRNA (Dharmacon ON-TARGETplus 
Non-targeting siRNA #2 D-001810-02-05) at a final concentration of 200 nM, 
using Lipofectamine 2000 (ThermoFisher, 11668) in Opti- MEM (Gibco). Four 
hours after transfection, 6 volumes of fresh satellite cell growth medium were 
added and fibres were cultured for 72 h at 37°C, 3% O>. Myofibres were fixed for 
15 min in 4% PFA before immunostaining for proliferation, differentiation and 
self-renewal markers”. 

Immunostaining on cells, sections and myofibres. Following fixation, cells and 
myofibres were washed three times with PBS, then permeabilized and blocked at 
the same time in buffer containing 0.25% Triton X-100 (Sigma), 10% goat serum 
(Gibco) for 30 min at room temperature. For BrdU immunostaining, cells were 
unmasked with DNasel (1,000 U/ml, Roche, 04536282001) for 30 min at 37°C. 
Cells and fibres were then incubated with primary antibodies (Supplementary 
Table 2) for 4 h at room temperature. Samples were washed with 1x PBS 
three times and incubated with Alexa-conjugated secondary antibodies (Life 
Technologies, 1/1,000) and Hoechst 33342 (Life Technologies, 1/5,000) for 45 min 
at room temperature. EdU staining was chemically revealed using the Click-iT Plus 
kit according to manufacturer’s recommendations (Life Technologies, C10640). 
For collagen staining, the myofibres and the muscle sections were incubated with 
0.1% Triton X-100 for 30 min at room temperature. Myofibres and sections were 
then washed 3 x 10 min and incubated with 10% goat serum in PBS for 30 min. 
After one wash, samples were incubated with primary antibodies and secondary 
antibodies as described in Supplementary Table 2. Confocal images were acquired 
with a Leica SPE microscope and Leica Application Suite or with Zeiss LSM 700 
microscope and Zen Blue 2.0 software. Three-dimensional images were recon- 
structed from confocal Z-stacks using Imaris software. The Section view function 
was used to inspect the environment of the satellite cells by showing the cut in the 
x, yand z axes. 

Reserve cell cultures. Enzymatically dissociated muscles were plated in gelatin- 
coated dishes (1/30 of total mouse muscles per cm?) in the satellite cell growth 
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medium described above. When myotube formation was detected (day 7 to 10), 
recombination was induced by addition of 4-hydroxytamoxifen (4-OHT; Sigma, 
H6278) at final concentration of 1 1M every other day. Seven days later, 4-OHT- 
containing medium was replaced every other day with fresh medium containing 
elcatonin (0.1 U/ml), for an additional 10 days. To assess proliferation, cells were 
pulsed with 1 x 10-°M EdU for 6 h before fixation (10 min, 4% PFA). Reserve cells 
were defined by immunofluorescence as PAX7*EdU ~ cells**. For each medium 
change, only half of the conditioned medium was removed and replaced by an 
equal volume of fresh medium. 

Construction of luciferase reporters and luciferase assays. For the generation 
of luciferase reporters, candidate enhancers of Col5a1, Col5a3, Col6a1 and Col6a2 
(a shared enhancer), and Hey1 were amplified by PCR from genomic DNA of 
C2C12 cells. The enhancers were then cloned into the firefly-luciferase pGL3- 
Basic vector (Promega, E1751) upstream of a minimal thymidine kinase promoter. 
The sequences of enhancers are listed in Supplementary Table 3. Transfected cells 
(Lipofectamine LTX, Life Technologies, 15338030) were lysed and luciferase signal 
was scored using the Dual-Luciferase Reporter Assay System (Promega, E1910). 
For normalization, Renilla luciferase (pCMV-Renilla) was transfected at 1:20 ratio 
relative to firefly-luciferase constructs. 

RNA isolation and RT-qPCR. Total RNA was extracted from satellite cells iso- 
lated by FACS using QIAGEN mini RNeasy kit and reverse transcribed using 
SuperScript III (Invitrogen, 18080093), according to manufacturer’s instruc- 
tions. RT-qPCR was performed using FastStart Universal SYBR Green Master 
mix (Roche, 04913914001) and analysis was performed employing the 2-44 
method and using the average of the control values as a reference*®. Specific for- 
ward and reverse primers used in this study are listed in Supplementary Table 1. 
Stable cell line manipulations. The mouse myoblast cell line C2C12 was cultured 
in DMEM/ 20% FBS/ 1% PS at 37°C, 5% CO>. 

Notch activation. Notch activation was achieved by plating cells on DLL1-coated 
dishes or by doxycycline-inducible Notch constructs, as previously described!” 
Calcr retrovirus preparation and transduction. Calcitonin receptor Cla-type 
(pMXs-Calcr-Cla-IRES-GFP) and mock control (pMXs-IRES-GFP) retrovirus 
vectors were prepared as previously described”)“°. In brief, 48 h after transfection 
of Platinum-E cells the supernatant was recovered and used to transduce C2C12 
cells. Two days later stably labelled GEP* C2C12 cells were isolated by FACS. All 
stable cell lines used in this study are negative for mycoplasma contamination. 
Quantification of cAMP. Transduced mock (IRES-GFP) and Calcr (CalcR- 
Cla-IRES-GFP) C2C12 cells were isolated by FACS based on GFP expres- 
sion and seeded on 0.1% gelatin-coated, white culture 96-well plates (Falcon, 
353296) at 3 x 10% cells per well. After overnight culture, the cells were incu- 
bated with the complete induction medium containing DMEM/1% PS/500 1.M 
isobutyl-1-methylxanthine (Sigma, 17018)/100 1M 4-(3-butoxy-4-methoxy-benzyl) 
imidazolidone (Ro 20-1724 Sigma, B8279)/MgCl 40 mM, collagen, solvant HOAc 
or elcatonin (0.1 U/ml) for 3 h. The amount of intracellular cAMP was measured 
using cAMP-Glo Max Assay (Promega, V1681) following the manufacturer's pro- 
tocol. Luminescence was quantified with FLUOstar OPTIMA (BMG Labtech). 
The ECso value was determined with GraphPad Prism software using a sigmoid 
dose-response curve (variable slope). 

Biotinylation of collagens. Commercial collagen proteins (COLI rat tail, BD 
Biosciences, 354236; COLV human placenta, Sigma, C3657) were biotinylated 
using the Pierce EZ-Link Biotinylation Kit, with slight modifications. In brief, 
20 il of 1 M HEPES was added to 0.5 ml of 1 mg/ml collagen dissolved in 0.5 M 
HOAc. Then, 20 iil of 100 mM biotin reagent were added and incubated at room 
temperature for 1.5 h. Biotinylated collagens were next dialysed in 25 mM HEPES, 
2.5 M CaCh, 125 mM NaCl, 0.005% Tween (Slide-A-Lyze MINI Dialysis Device, 
Thermo Fisher 88401) overnight at 4°C. 

On-cell enzyme-linked immunosorbent assay. Transduced mock and Calcr 
C2C12 cells were seeded on a clear-bottom 96-well plate (TPP, 92096) at a density 


of 3 x 10° cells per well. After overnight culture, cells were treated with 50 j.g/ml 
of biotinylated collagens for 2 h and fixed with 4% PFA/PBS for 15 min. After 
3x PBS washes, cells were blocked with a solution containing 10% goat serum, 
2% BSA, PBS for 1 h at room temperature, washed and incubated for 1 h at room 
temperature with goat anti-mouse biotin-HRP antibody (Jackson, 1/1000e, 
115-035-003). After 3 PBS washes, the HRP signal was developed by addition 
of 3,3’,5,5’ tetramethylbenzidine (1-Step Ultra TMB-ELISA, Sigma, 34028). HRP 
substrate and absorbance at 650 nm was measured once every 30 s for 30 min with 
FLUOstar OPTIMA (BMG Labtech). The signal was normalized to the background 
signal (no secondary antibody) and to the number of cells assessed by Janus green 
staining (Abcam, ab111622). 

Statistical analysis. No statistical methods were used to predetermine sample size. 
The investigators were not blinded to allocation during experiments and outcome 
assessment. No animal has been excluded from analysis and no randomization 
method has been applied in this study. For comparison between two groups, two- 
tailed paired and unpaired Student's t-tests were performed to calculate P values 
and to determine statistically significant differences (see legends of Figs. 1-4). 
Additional specific statistical tests are detailed in legends of Figs. 1-4. All exper- 
iments have been done twice with the same results. All statistical analyses were 
performed with Excel software or GraphPad Prism software; Kruskal-Wallis test 
was performed in R. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. All data that support the findings of this study are available from 
the corresponding authors upon request. 
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Extended Data Fig. 1 | Identification of NICD/RBPJ-bound enhancers 
and response to activation of Notch signalling. a, Gene expression 
microarray data show that satellite cells express a specific subset of 
collagen types, which include the fibrillar COLI (Colla1 and Colla2), 
COLIII (Col3a1, possibly as (a1(III))3 homodimer) and COLV (Col5a1, 
Col5a2 and Col5a3) and the non-fibrillar COLIV (Col4a1 and Col4a2), 
COLVI (Col6a1 and Col6a2) and COLXV (Col15a1, possibly as (a1(XV))3 
homodimer). Data are shown as a heat map of normalized collagen 
transcripts expressed at different developmental time points (E12.5, 
E17.5 and post-natal day (P)8; Tg-Pax7-nGFP, Gene Expression Omnibus 
(GEO) accession number GSE52192), quiescent and post-injury (t= 60h 
after BaCl) injury). b, ChIP-seq tracks indicating NICD/RBPJ-occupied 
enhancers, associated with mouse Col5a1, Col5a3, Col6al and Col6a2 
loci. H3K4mel1, H3K27ac, p300 and NICD are shown. Orange rectangles 
indicate RBP) binding positions and asterisks indicate the enhancers 

used for transcriptional activity assays in c. c, Core sequences of the 
selected NICD/RBPJ-bound enhancers (asterisked orange rectangle in 


Fig. 1a and in b). The RBPJ consensus binding motif is highlighted in 
yellow. d, Transcriptional response of isolated enhancers to activation of 
Notch signalling in C2C12 cells. Firefly luciferase signal was measured 
in cells with doxycycline-inducible expressed human Notch1-GFP 
(NICD) and GFP control cells treated with (2S)-N-[(3,5-difluorophenyl) 
acetyl]-L-alanyl-2-phenyl] glycine 1,1-dimethylethyl ester (DAPT) and 
were normalized to internal control (pCMV-Renilla). Data are expressed 
as relative luminescence units (n = 3 independent experiments). Data 
are mean + s.d.; two-sided paired t-test. e, Expression measurements, 
based on RNA sequencing, of collagen genes in myogenic C2C12 cells, 
with active (treated with Delta-like 1) or inhibited (treated with DAPT) 
Notch signalling for 6 or 24 h (data available at GEO, accession number 
GSE37184). Data are shown as Delta-like 1-to-DAPT ratios of average 
reads per kilobase of exon model per million mapped reads (RPKMs). 
Genes with low expression (RPKM < 2) were eliminated. Heyl and Hey1 
transcripts indicate Notch pathway activation. Red line designates no 
change (ratio= 1). 
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Extended Data Fig. 2 | See next page for caption. 
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Extended Data Fig. 2 | Notch signalling regulates Col5 and Col6 
expression in vivo. a, Satellite cells isolated by FACS at day 10 after 
tamoxifen injections, from resting tibialis anterior muscle from control 
(Tg:Pax7-CT2;Rbpj*'~;R26™"“/+) and Rbpj-null (Tg:Pax7-CT2;Rbpj—; 
R26™"™G’+) mice immunostained for RBPJ. b, RT-qPCR of collagen genes 
in Rbpj cKO and control satellite cells. Hey] used as control for Notch 
signalling (n = 3 mice per genotype). c, Induction of collagen genes in 
E17.5 control (Myf50"+;R26™™/") and Myf5?-NICD (Myf50""*; 
R2681P-NICD-nGFP/*) cells isolated by FACS. RT-qPCR was normalized to 
Gapdh, n= 3 fetuses per genotype. HeyL reports Notch activity. d, FACS 
plots showing fractionation of GFP* cells from E17.5 Tg:Pax7-nGFP 
fetuses into Pax7"8" (20% of population), Pax7™ 4 (40%), and Pax7!” 
(20%). The intensity of the GFP signal reflects the activity of the Pax7 
promoter. e, Transcript levels of GFP* cells isolated by FACS show a tight 


correlation between lineage progression, Notch signalling activity and 
collagen gene expression (n = 3 fetuses per genotype). f, Specificity of 
«3-COLV antibody assessed by immunostaining of tibialis anterior muscle 
transverse section from wild-type and Col5a3 cKO P14 postnatal pups 
(n=3 mice per genotype). g, Time course of gene expression performed 
by RT-qPCR on freshly isolated satellite cells (Quiescent), 48 h or 60h 
after cardiotoxin injury of tibialis anterior muscle (48 hours post injury 
(hpi), 60 hpi), and isolated single myofibres from extensor digitorum 
longus muscle of Tg:Pax7-nGFP mice. Col5a1 and Col5a3 were strongly 
downregulated in activated and differentiated cells. Quiescence (Pax7, 
Calcr) and differentiation (Myog) markers are indicated. Col4a2, a major 
component of the basement membrane, is expressed mainly by myofibres 
(n=3 mice per condition). Data are mean = s.d.; one-sided unpaired 
t-test. Scale bars, 50 tm. 
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Extended Data Fig. 3 | COLV delays proliferation and differentiation immunostaining of Rbpj cKO satellite cells (n = 3 mice per condition) 


of satellite cells. a, Experimental scheme: isolated T'g:Pax7-nGFP satellite incubated 60 h in presence of COLI or COLY, or with HOAc control (n =3 
cells cultured overnight (o/n) before collagen treatment. b, Myosin heavy mice, >200 cells, 2 wells per condition). f, Percentage of EdU* cells (2h 


chain (MyHC) and EdU staining of satellite cells treated with COLI or pulse) of Rbpj-null primary myogenic cells, after ten days of culture with 
COLYV. Fusion index: 82%, 86% and 33% for HOAc solvent, COLI and HOAc or indicated collagens. EdU: 1.0% and 7.6% for COLI and COLV, 
COLY, respectively (n = 3 mice, >250 cells, 2 wells per condition). respectively (n =3 mice, >150 cells, 2 wells per condition). 

c, Percentage of EdU* primary myogenic cells after ten days of culture g, RT-qPCR on GFP* Rbpj-null satellite cells isolated by FACS and 

with indicated collagens. EdU: 2.6%, 1.3% and 18.2% for COLI, COLVI cultured for 72 h in the presence of COLI or COLV. Results are 

and COLY, respectively (n =3 mice, >250 cells, 2 wells per condition). normalized to Tbp. Data are mean = s.d.; two-sided paired t-test; P value: 
d, Experimental scheme for control and cKO mice. Satellite cells two-sided unpaired t-test. Scale bars, 50 jum. 


were plated overnight before collagen treatment. e, GFP and MyHC 
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Extended Data Fig. 4 | COLV—and specifically 03-COLV—is of Tg:Pax7-nGFP isolated single myofibres cultured for 72 h and 

critical for satellite cell self-renewal. a, RT-qPCR of Col5a1 in immunostained for GFP and MYOD. Resident satellite cells enter 
control (Ctr; Tg:Pax7-CT2;Col5al1 +/+ RQ6MING) | heterozygous (Het; the myogenic program and form clusters composed of proliferating 
Tg:Pax7-CT2;Col5a t+ ;R26"™°) and conditional knockout (cKO; (PAX7*MYOD*MYOG ), differentiated (PAX7~” MYOG"*) and 
Tg:Pax7°??, Col5a Lo/flox, R26"mG) mice two weeks after tamoxifen self-renewed (PAX7*MYOD_) cells within 72 h. Quantification of 

diet (n = 3 mice per genotype). b, Transcript levels of the different Col5 PAX7*MYOD~, PAX7*MYOD* and PAX7- MYOD* populations 72 h 
mRNA chains in C2C12 after transfection of either control scramble, after transfection. Scramble siRNA was used as negative control (n > 15 
Col5a1 or Col5a3 siRNA, showing the specificity of each siRNA for its fibres counted from 3 mice). Data are mean + s.d.; a, two-sided unpaired 
given targeted mRNA. Data are normalized to Tbp gene expression t-test; b, c, two-sided paired t-test. Scale bar, 50 jum. 


(n=3 independent assays). c, Col5a1 and Col5a3 siRNA transfection 
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Extended Data Fig. 5 | See next page for caption. 
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Extended Data Fig. 5 | Screening for COLV receptor candidates 
identifies CALCR. a, Screening for the COLV receptor: satellite cells from 
Tg:Pax7-nGFP mice were incubated for ten days with COLV and candidate 
receptors were targeted with respective inhibitors: 7rh for DDR1 (sub- 
panels C, D), the broad-spectrum Integrin-binding competitor RGDS 
peptide (sub-panels E, F), Obtustatin for Integrin «181 (sub-panels G, H), 
TC-I 15 for Integrin 0281 (sub-panels I, J). DMSO solvent was used as a 
control for TC-I 15 and 7rh (sub-panels A, B). Satellite cell differentiation 
was assayed by MyHC immunostaining. b, EdU (2 h pulse) and CALCR 
staining of GFP* C2C12 cells isolated by FACS and transduced with 
Calcr-GFP or mock-GFP retrovirus and cultured for 24 h with COLI (top) 
or COLV (bottom). Quantification of EdU* Calcr-transduced C2C12 

cells or mock-GFP cells treated for 24h with COLV or with the controls, 
COLI and HOAc (n=5 independent experiments, >250 cells counted, 

2 wells per condition). There was no significant difference between HOAc 
and COLI treated samples (data not shown). c, Experimental scheme of 
tamoxifen administration to control (Ctr) (Caler+/+) and cKO (Calcrf*/fl*) 
mice. FACS plot of satellite cells from Pax7eERT2/+ .Calcrloxflox, R265t0P- YFP 
and Pax7©?ER!2/+ .Caler*!+;R26%P YFP mice. Cells sorted based on YFP 
expression. d, Control and Calcr cKO satellite cells isolated by FACS, fixed 
immediately after sorting and immunostained for CALCR to confirm the 
absence of CALCR protein from recombined cells. For control (upper 


panel), two fields from the same culture dish are shown, separated by a 
white line. Asterisk shows a non-recombined, CALCR* cell in the cKO 
sample (lower panel). e, Quantification of PAX7*+, Myogenin* and EdUt 
cells in Calcr-depleted satellite cells (Pax7©'/+;Caler!!?*S*; R265? YFP) 
isolated by FACS and treated for 32 h or 72 h with COLI or COLV (n=3 
mice, >250 cells counted, 2 wells per condition). f, Quantification of total 
PAX7* (GFP), Myogenint and EdU* myogenic cells isolated by FACS 
from Tg:Pax7-nGFP mice three days after cardiotoxin injury of tibialis 
anterior muscle, and incubated for 72 h in presence of COLI or COLV, 

or HOAc as a control, in the culture medium (n =3 mice, >200 cells 
counted). g, CALCR protein in freshly isolated satellite cells, or satellite 
cells cultured for 12 h, from Tg:Pax7-nGFP mice, demonstrating that 
CALCR protein is still present when satellite cells are treated with different 
collagens (see Extended Data Fig. 2). h, Induction of Calcr transcript 
expression by RT-qPCR of Tg:Pax7-nGFP satellite cells isolated by FACS 
and cultured for 72 h in the presence of COLI or COLV. Results are 
normalized to Tbp (n =3 mice). i, Immunostainings for CALCR protein 
of Tg:Pax7-nGFP satellite cells cultured for 72 h in presence of COLI or 
COLV (n=3 mice, >50 cells, 2 wells per condition). Data are mean + s.d.; 
b, two-sided unpaired t-test; c-i, two-sided paired t-test. Scale bars, 

25 jum (g), 50 jum (a, b, d, i). 
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Extended Data Fig. 6 | CALCR ligand elcatonin can substitute the 
depletion of the surrogate ligand COLV. a, Intracellular levels of cAMP 
in Calcr-transduced C2C12 cells treated with COLV for up to 480 min 
(n=4 independent assays). b, Rescue of loss of COLV by elcatonin in an 
ex vivo self-renewal reserve-cell model, where PAX7* non-proliferative 
cells return to quiescence (see Methods). MyHC and PAX7 staining 

of control (Ctr: Tg:Pax7-CT2;Col5a1*!*;R26""°) and ColSa1-null 


(Tg:Pax7-CT2;Col5a Lfox,R26™™G) cells, non-treated (NT) or treated 
with elcatonin. No GFPtEdU* cells (12 h pulse) could be detected under 
any of the conditions, indicating GFP* cells are quiescent (data not 
shown). c, Quantification of percentage of reserve cells (PAX7* per total 
nuclei) (n =3 mice per genotype and condition, >350 cells counted). 
Elcat, elcatonin. Data are mean + s.d.; two-sided paired t-test; #, P value 
calculated by two-sided unpaired t-test. Scale bar, 50 xm. 
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Extended Data Fig. 7 | Schematic of Notch-COLV-CALCR axis in 
satellite cells. A Notch-COLV-CALCR signalling cascade actively 
maintains satellite cell quiescence. Satellite cells are in direct contact with 
the plasma membrane of the myofibre (black outline) and an overlying 
basement membrane (blue line). Activation of the Notch receptor is 
achieved by a ligand (probably DLL1 or DLL4) present on the muscle 
fibre. Induction of Col5a1 and Col5a3 (and also Col6a1 and Col6a2) genes 
occurs via distal regulatory elements (grey box). Satellite-cell-produced 


Col5a1 cKO in satellite cells 


COLV is deposited under the basement membrane and acts as a surrogate 
ligand of the plasma membrane receptor CALCR, also expressed by the 
satellite cells, thereby propagating a cell-autonomous signalling system in 
the local niche. In the absence of COLV (deletion of Col5a1) the quiescent 
niche is disturbed, CALCR signalling is abrogated, and satellite cells 
spontaneously differentiate and fuse to myofibres, leading to exhaustion of 
the muscle stem cell pool. 
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Reconstruction of antibody dynamics and infection 
histories to evaluate dengue risk 


Henrik Salje*4*, Derek A. T. Cummings*>°, Isabel Rodriguez-Barraquer’, Leah C. Katzelnick®, Justin Lessler*, 
Chonticha Klungthong®, Butsaya Thaisomboonsuk®, Ananda Nisalak®, Alden Weg®, Damon Ellison®, Louis Macareo®, 


In-Kyu Yoon’, Richard Jarman!, Stephen Thomas", Alan L. Rothman”, Timothy Endy!’ & Simon Cauchemez 


As with many pathogens, most dengue infections are subclinical and 
therefore unobserved. Coupled with limited understanding of the 
dynamic behaviour of potential serological markers of infection, this 
observational problem has wide-ranging implications, including 
hampering our understanding of individual- and population- 
level correlates of infection and disease risk and how these change 
over time, between assay interpretations and with cohort design. 
Here we develop a framework that simultaneously characterizes 
antibody dynamics and identifies subclinical infections via Bayesian 
augmentation from detailed cohort data (3,451 individuals with 
blood draws every 91 days, 143,548 haemagglutination inhibition 
assay titre measurements)”. We identify 1,149 infections (95% 
confidence interval, 1,135-1,163) that were not detected by active 
surveillance and estimate that 65% of infections are subclinical. 
After infection, individuals develop a stable set point antibody load 
after one year that places them within or outside a risk window. 
Individuals with pre-existing titres of <1:40 develop haemorrhagic 
fever 7.4 (95% confidence interval, 2.5-8.2) times more often than 
naive individuals compared to 0.0 times for individuals with 
titres > 1:40 (95% confidence interval: 0.0-1.3). Plaque reduction 
neutralization test titres <1:100 were similarly associated with 
severe disease. Across the population, variability in the size of 
epidemics results in large-scale temporal changes in infection and 
disease risk that correlate poorly with age. 

Despite the large body of literature from observational and cohort 
studies describing dengue cases, we still have major difficulties in 
explaining individual- and population-level differences in infection 
and disease risk. These difficulties mostly arise from a fundamental 
methodological issue in the research of many pathogens for which indi- 
vidual histories of infection are difficult to capture. The four dengue 
virus serotypes (DENV1-DENV4), which are found across tropical and 
sub-tropical regions and lead to an estimated 390 million infections 
each year, cause a range of disease manifestations, from asymptomatic 
infection to death*®. High levels of subclinical infection indicate that 
even in regions with thorough active surveillance, the majority of infec- 
tions are missed’. This observational problem has wide ranging impli- 
cations as it not only hampers our ability to estimate the underlying 
level of infection in the community and characterize individual risk 
factors for infection and severity, but also our ability to assess correlates 
of protection, dynamically monitor susceptibility at both the population 
and individual level, define optimal thresholds for the interpretation of 
serological assays and critically assess cohort design. 

Here, we develop an analytical framework that can address this 
challenge, leading to new insights into a broad range of questions. We 
use this framework to both characterize antibody changes following 


1,2,3,13 


infection and identify infection events that were missed by surveillance 
on the basis of the analysis of longitudinal data from cohort studies. 
We apply the analysis to data from a school-based cohort study in 
Thailand (n= 3,451, mean age at recruitment was 9 years old, inter- 
quartile range, 8-11), in which subjects had blood taken on average 
every 91 days for up to five years and when illnesses were detected 
through active surveillance’. Surveillance of active fever and school 
absence of children was conducted from June to mid-November when 
DENV circulation is concentrated”. Haemagglutination inhibition tests 
were used to measure antibody titres of each serotype in each sample 
(143,548 haemagglutination inhibition measurements in total). Plaque 
reduction neutralization test (PRNT) titres were also measured on a 
subset of 1,771 samples. Haemagglutination inhibition titres correlated 
closely with PRNT titres (Pearson correlation of 0.91) and with inhi- 
bition enzyme-linked immunosorbent assays (ELISAs), although titre 
values differ between laboratories and between assays®°. 

To track the evolution in the measured antibody titres of an individ- 
ual (Fig. 1a), we placed titres on an adjusted log, scale (titres of 1:10 
were given a value of 1, 1:20 a value of 2 and so on). There were 274 
detected symptomatic DENV infections (Fig. 1b); 62 children were 
hospitalized (23%) and 36 had dengue haemorrhagic fever (DHF) 
(13%). In cases for which the infecting serotype was known through 
PCR (79% of cases, Supplementary Table 1), we observed a sharp rise 
and subsequent decay in log, titres after the onset of symptoms (Fig. 1c, d). 
The mean logs titre of the infecting serotype was 0.79 (95% confidence 
interval, 0.74—0.84) times the log, titre of the non-infecting serotype 
in the three months before onset of symptoms compared to 0.94 (95% 
confidence interval, 0.93-0.96) times in the six months after the onset 
of symptoms (Fig. le). Because 86% of cases with symptomatic infec- 
tions had detectable titres of at least one serotype before infection, the 
higher antibody titre of non-infecting serotypes probably captures 
responses to prior infections)”. 

We reconstructed the antibody trajectories of each individual by 
assuming that infection leads to an increase in titres that subsequently 
decays exponentially''. We also explored biphasic responses (Extended 
Data Fig. 1). We allow for variability in antibody kinetics across indi- 
viduals and infections, and for differential rises for the infecting versus 
the non-infecting serotypes for primary infections but undifferentiated 
responses for subsequent infections. We use data augmentation tech- 
niques to impute undetected infections (subclinical infections during 
active surveillance or unknown symptom status outside the surveil- 
lance windows) and to identify the serotype of undetected primary 
infections’. Instead of relying on fixed cut-offs to identify infections, 
data augmentation allows us to incorporate uncertainty in the exist- 
ence, timing and serotype of unobserved infection events and therefore 
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Fig. 1 | Titre responses following infection. a, Measured (dots) and 
model fit (lines) for three example individuals. Each dot represents the 
mean titre across the four serotypes. The pink-shaded regions are periods 
of active surveillance. Blue arrows represent confirmed symptomatic 
dengue infections. Black arrows represent estimates of timing of 
subclinical infections from an augmented dataset. During the active 
surveillance windows, these augmented infections represent subclinical 
infections whereas outside the surveillance window, it is unknown if the 
individual had symptoms. b, Serotype distribution of PCR-confirmed 
symptomatic infections (green, DENV1; blue, DENV2; maroon, DENV3; 
orange, DENV4; black, unknown serotype). The grey bars represent the 
estimated distribution of infections not detected by active surveillance. 


we can probabilistically assess whether differences in measured titres 
are due to infections or assay variability. 

We find that after post-primary infections, there is a mean increase 
of 5.8 (95% confidence interval, 5.6-5.9) in log, titres across serotypes, 
which decreases by 76% after one year. For primary infections (that 
is, individuals without detectable titres before infection), the mean 
increase in log) titre is 7.6 (95% confidence interval, 7.4-7.8) for the 
infecting serotype and 6.6 for non-infecting serotypes (95% confidence 
interval, 6.4-6.7). The similarity in titres of infecting and non-infecting 
serotypes coupled with assay variability suggests that in a clinical 
setting individual haemagglutination inhibition measurements cannot 
reliably determine the infecting serotype. We find that titres largely 
stabilize one year after infection to a set point (the ‘set-point antibody 
load’; Fig. 1d). There is substantial variability between infections: the 
interquartile range of the increase in log, titre one year after infection 
is 0.7-2.2 across all infections (Extended Data Fig. 2a). We find that 
even after accounting for historic infection status, measured DENV2 
titres are systematically lower than other serotypes (0.85 lower than 
DENV1; Extended Data Fig. 2b and Supplementary Table 2), which 
could indicate technical considerations of the DENV2 assay or inherent 
differences in immune responses to DENV2. We estimate the meas- 
urement error in the haemagglutination inhibition assay (that is, the 
standard deviation in any reading) to be 0.49 (95% confidence interval, 
0.49-0.50), which is similar to the empirically estimated standard devi- 
ation using repeated testing on the same serum and 2.6 times the error 
estimates of the PRNT”? (Extended Data Fig. 2c). Despite the variability 
in individual readings, because we use many readings from four sero- 
types for each participant and titres appear to behave in a stable and 
predictable manner, we can nevertheless make robust inferences when 
considering the ensemble of the measurements. 

We probabilistically identify 1,149 undetected infections (95% range 
across model iterations, 1,135-1,163), of which 507 (494-520) occurred 
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The periods of active surveillance are in pink (5.5 months per year). 

c, Model fit (lines) and observed (dots) titres pre- and post-infection for 
primary infections (infecting serotype in blue, non-infecting serotypes 

in red) and subsequent infections (green). d, Mean difference between 

the observed log; titres at different time points after infection with those 
at one year after infection for all augmented and observed infections 
(average of 1,421 total infections across 100 reconstructed datasets) with 
95% confidence intervals. Ref., reference time point. e, Titre ratio of the 
infecting to the mean of the three non-infecting serotypes before and after 
symptom onset with 95% confidence intervals for the 217 individuals with 
symptomatic infections for which the infecting serotype was detected 

(n = 3,366 total titre measurements). 


during active surveillance periods and were therefore subclinical 
(Fig. 1b). Overall, we estimate that 35% of infections are symptomatic 
(95% confidence interval, 34-36%). The temporal distribution of sub- 
clinical infections was correlated with that of symptomatic infections 
(Pearson correlation of 0.78, 95% confidence interval, 0.70-0.84). Using 
augmented primary infections for cases in which we could confidently 
assign the infecting serotype (same serotype implicated by >50% of 
iterations), we find that 34% of undetected primary infections (and 
39% of subclinicial primary infections) were due to DENV4, compared 
to only 3% of all symptomatic infections (none of which were primary 
infections; Extended Data Fig. 3a, b). We find consistent results using 
a more stringent cutoff to assign the infecting serotype (Extended Data 
Fig. 3c). These findings are consistent with a reduced risk of disease 
from DENV4 compared to other serotypes resulting in a mostly silent 
DENV4 epidemic. This is supported by a phylogenetic analysis that 
found that DENV4 was widespread in Thailand throughout this period 
(see supplementary figure 4 of Salje et al.'%). This suggests that the 
serotype distributions from hospital-based or community-based sur- 
veillance may not be representative of infections in the population and 
supports previous evidence that the transmissibility of a serotype can 
be delinked from the propensity to cause symptomatic and/or severe 
disease!*!>, Furthermore, these results indicate that factors that con- 
tribute to transmission potential (for example, viral replication, peak 
titres or infection length) are not predictive of adverse outcomes’®. 
We find that the underlying probability of infection and the probabil- 
ity of developing disease are strongly linked to the mean antibody titre 
at the time of exposure. Overall, an individual's annual risk of infection 
was 17%, varying from 21% for individuals with mean measured log, 
titres of <2, to 16% for those with log, titres of 2-3 and 11% for those 
with log, titres of >3 (Fig. 2a). Using logistic regression, we find that 
for log: titres of >2, each unit increase in log: titres is associated with 
a 0.71 relative risk of infection (95% confidence interval, 0.67-0.76). 


rt of Springer Nature. All rights reserved. 


a Linear titre b Linear titre 
10 40 160 10 40 160 
L | J 
0.10 
0.2 
= 
= e Oo 2 
Qc > os O 
8 25 
£3 0.4 ag 
= 5 Qe 
Sw os 
2 to) ao 
& oe 
0 0 
0 2 4 6 0 2 4 6 
log, titre log, titre 


Fig. 2 | Probability of infection and disease as a function of titre. 

a-d, Annualized probability of infection (a), developing any symptoms 
(b), being hospitalized (c) and developing DHF (d) as a function of the 
mean measured antibody titre across all serotypes at the time of exposure 


The annual probability of having a symptomatic infection varies from 
6.4% (95% confidence interval, 4.9-8.4%) for primary infections to 
8.4% (95% confidence interval, 7.8-9.1%) for individuals with pre- 
existing log, titres <3 (<1:40 ona linear scale) and 4.0% (95% confidence 
interval, 3.0-5.0%) for those with log, titres >3 (Fig. 2b). The annual 
probability of being hospitalized during a primary infection was 1.2% 
(95% confidence interval, 0.5-2.1%), compared to 2.4% (95% confi- 
dence interval, 2.1-2.7%) during a subsequent infection for those with 
pre-existing log, titres <3 and 0.3% for those with log, titres >3 (95% 
confidence interval, 0.09-0.6%; Fig. 2c). Even more pronounced was 
the risk for developing DHE, which ranged from 0.2% (95% confidence 
interval, 0.0-0.6%) for primary infections to 1.5% (95% confidence 
interval, 1.3-1.7%) for subsequent infections in those patients with 
log, titres <3 and 0.0% for log, titres >3 (95% confidence interval, 
0.0-0.4%; Fig. 2d). Within this study population, an average of 54% 
of the population had detectable log, titres of <3 at any point in time. 
Time-varying Cox proportional hazards models that specifically 
account for the dependence of titre observations within individuals 
gave similar results'” (Extended Data Fig. 4). Using log, titres to prob- 
abilistically identify the cohort participants with detectable titres that 
will develop DHF has an area under the curve (AUC) value of 0.66 
(Extended Data Fig. 5). 

When considering only infected individuals, we observe no differ- 
ence in the probability of subclinical infection by titre; however, the 
probability of hospitalization and DHF remains the highest in those 
with pre-existing log, titres of <3 (Extended Data Fig. 6a—c). Only one 
individual with pre-infection log, titres >3 developed DHF during sur- 
veillance compared to 146 who did not, but who had titres at infection 
within the same range. This suggests that in the event that infection 
does take place, antibodies are not protective against developing symp- 
toms as such, but conversely, are associated with the development of 
severe disease. We observe no difference in the risk of disease given 
infection across years (Supplementary Table 3) or age (Supplementary 
Table 4). Other studies are needed to investigate whether younger age 
groups than those included here nevertheless have an increased risk. 
PRNTs form the basis of current discussions on immune correlates. 
Among those infected, individuals with detectable PRNT log, titres 
of <4.5 (equivalent to approximately <1:100) have a 7.5 times (95% 
confidence interval, 2.4-11.6) higher risk of DHF compared to previ- 
ously naive individuals, compared to 0.0 times for those with higher 
titres (Extended Data Fig. 6d-f). Cross-reactive titres that result from 
exposure to non-DENV flaviviruses such as Japanese encephalitis and 
Zika may be included in these risk estimates. 

Our findings suggest that after infection set-point antibody loads 
appear to be important for the determination of individual infection 
and disease risk. After infection, we estimate the daily probability of a 
subsequent infection and the development of DHF disease as a function 
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across all study subjects (n = 3,451). The open circles on the left represent 
primary infections (that is, those with no detectable titres for any serotype 
before exposure). The shaded regions represent 95% bootstrap confidence 
intervals. 


of titre dynamics. We demonstrate that the probability of both infec- 
tion and disease stabilizes after one year (Fig. 3). On the basis of the 
observation in Fig. 2 that individuals with detectable titres of <3 had 
an increased risk of infection and disease, we explored the temporal 
evolution of risk following infection for those with set-point antibody 
loads (that is, the titre at one year following infection) above and below 
this threshold. At one year, we observe a 2.1 times higher risk of infec- 
tion (irrespective of disease outcome) for those with set-point antibody 
loads of <3 compared to those with greater antibody loads and an 8.9 
times higher risk of infection that leads to DHE Overall, we find that 
three years after infection, 34% of individuals with set-point antibody 
loads of <3 suffer a subsequent infection, irrespective of severity (95% 
confidence interval, 33-35%) compared to 23% for those with greater 
loads (95% confidence interval, 20-26%). After this delay, 3.5% of indi- 
viduals with set-point loads of <3 develop DHF disease (2.4-4.4%) 
compared to none in those with higher loads. The apparent stability 
of set-point antibody loads points to an ability to assess the long-term 
risk of an individual. 

Our findings are consistent with low titres generated by some can- 
didate vaccines in previously naive individuals, ‘priming’ individuals 
for severe disease upon their first exposure’®. A hypothesis that is sup- 
ported by previous evidence that primary infections in infants with 
maternal antibodies and secondary infections in older individuals are 
associated with severe disease!®°. Furthermore, a Nicaraguan study 
found elevated risk of severe disease for those with low inhibition 
ELISA titres at prior annual blood draws’. Previously naive individuals 
given the dengvaxia vaccine had mean PRNT titres within our risk 
window”! (Fig. 4d). Further work is required to understand whether 
immunity acquired from vaccination and natural infection are qual- 
itatively similar and whether the risk window described here is rel- 
evant for vaccine recipients. T cell immunity, which is not captured 
by these assays, might compensate for antibody titres in this window. 
Vaccine studies should carefully assess the criteria used to define sero- 
conversion, and how titres correlate with disease risk over time. Our 
work suggests that previously used criteria (PRNT titre >1:10) do not 
adequately correlate with reduction in disease risk and suggest that 
haemagglutination inhibition titres >1:40 or PRNT titres of >1:100 
may provide a starting point for any vaccine in identifying a targeted 
neutralizing antibody response. Placebo data from the dengvaxia vac- 
cine trials also suggests higher PRNT titres are linked to protection”. 
The targeted vaccination of individuals that have pre-existing antibody 
titres within our zone may be a viable approach to minimize the pub- 
lic health burden from dengue by moving individuals away from the 
risk window (Fig. 4d). Even in an endemic setting such as our cohort, 
there is considerable temporal variability in the serological status of 
9-year-old individuals (Extended Data Fig. 7), suggesting that the cur- 
rent WHO guidance surrounding dengvaxia or similar guidance that 
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Fig. 3 | Risk of subsequent infection and disease following an 

infection event. Data are from the average of 1,420 infections across 100 
reconstructed datasets. a, c, The probability of survival from subsequent 
infection (irrespective of disease outcome (a) and for those that led to 
DHF (c)) as calculated from Kaplan-Meier curves for those infections 
with set-point antibody titres of <3 (red) and >3 (blue) with 95% 
confidence intervals. The annualized probability of a subsequent infection 
(irrespective of disease outcome (b) and for those that led to DHF (d)) at 
different time points following infection for those infections with set-point 
antibody titres of <3 (red) and >3 (blue). 


is based on serostatus at vaccination will have to carefully consider this 
variation or specifically screen individuals. 

Our approach allows us to consider wider problems concerning 
drivers of dengue epidemiology. The assumption that population-wide 
immunity varies across time and dictates multi-annual dynamics of 
dengue pervades the literature and dominates current hypotheses about 
what drives large outbreaks of dengue in particular settings'*?*-**, 
More generally, the idea that temporally varying population immunity 
drives temporal dynamics of pathogens pervades infectious disease 
epidemiology””-”*. However, quantitative evidence that any popula- 
tion varies in dengue immune status over time is mostly unavailable, 
as is a link between the immune status of a population and the risk of 
epidemics in empirical data. Here, although we have only a short time 
series, we show that underlying the heterogeneity in the size of annual 
epidemics indicates that the risk of having titres within the risk zone for 
different birth cohorts is more correlated with the epidemic time point 
(Fig. 4a, mean correlation of 0.70) than with age (Fig. 4b, mean correla- 
tion of 0.23). Although both the probability of being naive and having 
log. titres above the risk zone correlated with age, strong birth-cohort 
effects also exist (Extended Data Fig. 7). For example, among 9-year 
olds, we observe up to a twofold difference in the probability of being 
naive, depending on the year of the study. 

Finally, our results can guide the design of cohort studies aiming 
to characterize transmission. Studies typically use a fourfold rise 
in titres against any serotype as evidence of infection, regardless 
of the timing of sample collection. Using our titre trajectories, we find 
that if blood draws are every 90 days, a fourfold cutoff point on meas- 
ured titres has a specificity of >99% and a sensitivity of 87% (Fig. 4c and 
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Fig. 4 | Evolution of population risk, implications for vaccine and 
cohort design. a, Proportion of study participants who have titres in the 
risk zone (defined as detectable log, titres <3) during the study period for 
different birth cohorts (coloured lines) and overall (black). The epidemic 
curve of all infections is shown in grey. b, Proportion of study participants 
with titres in the risk zone as a function of age for different birth cohorts 
(coloured lines) and overall (black). c, Performance of current assay testing 
protocol for which infection events are defined as a rise above a cutoff 
point in any serotype across two blood draws. d, Relationship between 
PRNT titre and haemagglutination inhibition (HI) titre for samples for 
which both assays were performed (n= 1,771 samples). The box plots 
show 2.5, 25, 75 and 97.5 quantiles as well as the mean. Superimposed are 
the results from the denvaxia vaccine study”! for previously seronegative 
(blue) and seropositive (red) samples before (open symbols) and after 
(filled symbols) vaccination. 


Extended Data Fig. 8). The sensitivity is reduced to 77% when blood 
is taken every six months and 62% when blood is taken annually, 
although it may be higher in seasonal settings when samples are taken 
at the end of the season. Using an alternative approach that uses the 
mean titre across the four serotypes and a 1.6-fold cutoff point, the 
sensitivity of the assay improves to 96% when samples are taken every 
six months and to 90% for annual bleeds (specificity >95%; Extended 
Data Fig. 9). We provide the optimal cutoff point and estimated sen- 
sitivity for these approaches and a theoretical estimate in which titres 
are on a continuous scale (such as PRNT) and for which a minimum 
specificity of >99% is required (Extended Data Fig. 9). 

We demonstrate through simulation that our framework can 
recover the true number of subclinical infections and parameters 
when only 30% of infections are symptomatic (Supplementary Table 5). 
Our approach is also robust to a scenario in which there are differ- 
ential rises in titres for symptomatic and non-symptomatic infections 
(Supplementary Table 6) and in which we incorporate school-specific 
force of infection parameters (Supplementary Table 7). In addition, 
we find that the timing (Extended Data Fig. 10a) and the serotype 
(Extended Data Fig. 10b) of undetected infections cluster in the same 
locations as symptomatic infections. This provides strong support 
for our modelling framework by suggesting that the model can correctly 
identify spatiotemporal clustering of otherwise undetected infections. 
These findings also support focal transmission, irrespective of dis- 
ease outcome!#33!, The approach presented here will be applicable 
across disease systems for which longitudinal titre data exists, allowing 
a wide range of insights into fundamental questions of disease ecology 
and risk. 
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METHODS 


Cohort study design. Individuals attending 12 different schools in the Kamphaeng 
Phet district, a rural region of Northern Thailand were recruited into a dengue 
cohort study that ran between 1998 and 2003 as previously described*”. All individ- 
uals were between 7 and 13 years old. Blood samples were taken four times a year 
(in January, June, August and November) with an average of 91 days between blood 
draws. In addition, from the start of June to mid-November each year, active sur- 
veillance was conducted through school-based surveillance. Children who missed 
school due to febrile illness had additional acute and convalescent blood draws. 
Dengue infection was confirmed using RT-PCR on the acute sample, with the 
infecting serotype also recorded or through antibody detection (IgM ELISA values 
>40 or haemagglutination inhibition test increases of over four times between 
acute and convalescent blood draws), in which case the infecting serotype was 
not known. The date of symptom onset, whether or not the child was hospitalized 
and whether or not they developed DHF was also recorded. Note that the cohort 
study was conducted before 2009, when the WHO provided new guidance for the 
characterization of different levels of dengue severity. 

Antibody measurements. For each blood draw of each individual, antibody titres 
for DENV1, DENV2, DENV3 and DENV4 were measured using a haemagglu- 
tination inhibition assay. The following twofold dilutions were used: 1:10, 1:20, 
1:40, 1:80, 1:160, 1:320, 1:640, 1:1,280 and 1:2,560. We translated each titre onto 
a log, scale such that 1:10 was given a value of 1, 1:20 of value of 2 and so on. 
Undetectable titres (those with a titre of <1:10) were given a value of 0. For a 
subset of 800 individuals, 1,771 samples were also tested using PRNTs. These sam- 
ples were either paired samples from individuals with symptomatic, confirmed 
infection with one sample taken from a time point before symptom onset and one 
sample after symptom onset (n =75 pairs) or randomly chosen sequential blood 
samples from individuals without a detected symptomatic infection between the 
blood draws. 

Characterizing how titres change after symptomatic infection. We wanted to 
understand how titres to both the infecting serotype and to non-infecting serotypes 
changed over time before and after symptom onset. For all individuals that expe- 
rienced a symptomatic illness for which the infecting serotype was identified, we 
identify all titre measurements within each 10-day window from 100 days before 
symptom onset to 600 days after symptom onset. For each window, we calcu- 
lated the mean titre of the infecting serotype and the average of the mean titres 
of the other three serotypes across all individuals that had a blood draw within 
that window. 

Modelling the dynamics of dengue antibody titres. Previous studies on malaria 
have used hidden Markov models to include undetected infections in estimates 
of the transmission intensity using presence/absence of specific antibodies in 
longitudinal data**. Although these efforts were able to improve estimates of the 
infection strength within a community compared to using symptomatic individ- 
uals, they did not incorporate the changing dynamics of antibody titres over time. 
By specifically including titre dynamics, we can help to understand a wide range 
of issues, including assay error, measures of protection and risk and cohort design. 
Notation. We consider an individual i. We denote the number of times the individ- 
ual was infected before time ¢. Each dengue i infection of individual i is labelled by 
the index w= 1,..., nj (t ) . We denote as T b the time of infection number w of 
individual i and 5;,,, is ae infecting serotype of infection number ¢) of individual i. 
The history of infection (that is, the timing and serotype of all infections since 
birth) of individual i up to time t is labelled H;(t). We denote as NA the total 
number of times the individual had blood taken during the study. Bach blood draw 
of individual i is labelled by the index 7 = 1, .. NA. We denote as 7 , the time of 
blood draw 7 for individual i. We denote as Ae the true antibody titre (see 
‘Measurement model’) and A; ,,, is the measured antibody titre for individual i for 
serotype s at blood draw 7. Aj(t) represents the cumulative infection strength 
exerted on individual i before time t. The parameter vector is denoted by 0. 
Hierarchical structure of the model. We can break down the probability of a measured 
antibody titre into three components: 
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The first part represents the ‘measurement model; the second part the ‘antibody 
dynamics model and the third part the ‘infection model. 

Measurement model. We model the underlying antibody levels on a continuous 
scale, however, the haemagglutination inhibition assay is a discrete assay, such that 
in a situation of no measurement error or systematic biases, a true antibody titre 
between any two dilutions would be measured as the lower of the two dilutions. 


So, for example, a true titre of 2.7 would be measured as 2 (assuming there are 
dilutions performed at 0, 1, 2, 3, ...). In addition, a measurement error is also likely to 
exist and there may be underlying differences between serotypes (that is, serotype- 
specific biases) in the assay that will impact all measurements of antibodies against 
a particular serotype. We consider a ‘true titre’ to represent the underlying (but 
unmeasured) titre on a continuous scale. A ‘measured titre’ is the value that is 
actually measured by the assay. Conditional on the history of infection of an indi- 
vidual, we assume independence between the measurements of the different sero- 
types. This seems a reasonable assumption as assays are performed separately for 
each serotype. The probability of the measured titres (Aj, ,,) is: 


Aisntl 
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where f(z) is the density for a normal distribution with mean Aj. + x; anda 
standard deviation parameter (co), where 


X, = 0if s = DENV1 
X, = X, if s = DENV2 
X, = x, if s = DENV3 


X, = x, if s = DENV4 


Antibody dynamics model. If an individual i was never infected by dengue, we 
assume that they will have titres of 0 against the four serotypes (this assumes that 
any maternal antibodies have disappeared and there is no impact of infections by 
other flaviviruses). At each time point that the individual becomes infected, their 
antibody titres will increase. We assume that this increase can be broken down 
into a permanent increase (representing antibodies that will continue to circulate, 
long after the infection has passed) and a temporary increase (representing the 
short-lived antibodies generated upon infection). 

The permanent rise in titres (Q;.(w)), for serotype s from infection number 7 
in individual i is modelled as: 


Q,, (y) = Wi; yK (~, s) 


where w},,, is a random effect that is gamma-distributed with mean parameter wm 
and variance parameter w, and K(v), s) allows a differential antibody response for 
each serotype for primary infections: K(¢), s) =1 if it is a primary infection (that 
is, 7)= 1) and s is the infecting serotype; K(~, s) = 1 otherwise. 

We assume that temporary antibody responses will decay exponentially over 
time: 


(t-7! 
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where 7, ,, is a random effect that captures the instantaneous Tise in tempo- 
rary antihady titres following the most recent infection (infection nj (t) ) before 
time t that comes from a gamma distribution with mean parameter 7m and variance 
parameter ¥\5 6; ,_,/ 4) 18 the rate of decay of the temporary antibodies and comes 
from a gamma distribution with mean Bennet 6m and variance parameter 6,. 
As with the permanent rise in titres, K ({) = n; nO) , ) allows differential antibody 
responses for primary infections: K(w, s)=1 if it is a primary infection (that is, 
w= 1) and s is the infecting serotype; K(1, s) =1 otherwise. Additional work is 
needed to understand whether alternative functional forms for the rise and 
decay in antibody titres may further refine how antibodies behave following 
infection. 

Under these assumptions and an additional linearity assumption that the tempo- 
rary and permanent rises are additive, antibody titres at blood draw k for serotype 
s in individual i is: 


Aisne =Q, (W=I) +... 
+ Q.b= ae) 
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Infection history model. We first assume that both the number of infections and 
the timing of infections are known. This assumption will subsequently be relaxed. 
We assume that each individual can get infected up to four times (once by each 
serotype). The history of infection of an individual depends on seasonality in 
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engue transmission and differences in infection intensity across years. For a 
d t d diff fect tensit F 
particular time f, the force of infection is assumed to be: 
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where \ is a parameter that represents the mean daily force of infection in 1998 
(the first year of the study) and {3 ,, is the mean force of infection in year |¢] com- 
pared to the force of infection in | 98. 

For an individual i, the contribution to the likelihood for periods before any 
infection, the probability of their infection history can be broken down into 
periods of infection and periods without infection. Individuals only contribute to 
the likelihood during their time in the study. 

For each infection that occurs at time t, the contribution to the likelihood is: 


log(1—exp(— A(2))) 


For each individual, each day during which no infection occurs, the contribution 
to the likelihood with respect to serotype s is exp(—A(t)) where more than 
90 days have passed since an infection by any serotype and the individual has not 
previously been infected by serotype s or is 0 otherwise, including periods when 
the individual is not part of the study 

The presence of the 90-day window during which no infection can take place 
avoids there being more than one infection event between two blood draws. This 
period is substantially shorter than the estimated period of cross-protection 
between serotypes of two years™. 

In the context of full observation, the probability of the history of infection for 
individual i can be given as: 
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where To represents the time of birth and T; the time point at which individual i 
leaves the study (defined as the day of their final blood draw). We assume the same 
A(t) for all serotypes. 

Situation of imperfect observation. In practice, we do not know the infection history 
of all individuals. Many infections will have occurred before individuals entered 
the study. In addition, there are likely to be many subclinical infections that would 
not have been detected through active surveillance. In addition, active surveillance 
only operated for 5.5 months of every year. Infections outside these periods would 
also have been missed (irrespective of symptoms). 

For the infection history of individuals before they enter into the study, we esti- 
mate a baseline titre (A;,,(to)) that represents the titre for serotype s one year before 
the first blood draw. As we assume linearity, such that the temporary and perma- 
nent titres of successive historic infections sum up to give the titre at a moment in 
time, this estimated baseline titre allows us to incorporate the impact of historic 
infection events up to one year before enrolment but means that we do not need 
to infer infection events before that time. Individuals that are naive at baseline 
(defined as those with no measured titres for any serotype at the first blood draw) 
are given a baseline titre of 0. For an individual with no infection events during the 
study period, A;,(t)=Ai,s(to) for all t. 

In the context of full observation during the study period, each individual would 
have the serotype and time from each infection, {sj,1, T;,y;}, known. For undetected 
infections or detected infections for which the infecting serotype is unknown 
(such as when symptomatic infections are only detected through IgM ELISA and 
therefore the serotype is unknown), we can use a Bayesian data augmentation 
framework. In this framework, the incompletely observed {s;,y;, 7;,y;} pairs are incor- 
porated and considered as nuisance parameters. The joint posterior distribution of 
the parameters and the augmented data are explored via reversible-jump Markov 
chain Monte Carlo (MCMC) sampling. 

If we call y= (Shy Tout ,_ the observed data, 


Tio) 
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and the augmented data), the joint posterior is: 
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P(y |z) represents the observation model, P (z |) is the titre model outlined above 
and P (6) gives the prior distribution of the parameters. 

The observation model makes sure that the augmented datasets are consistent 
with the observed data by having a value of 1 (if consistent) or 0 (if inconsistent). 
Consistent augmented data have the following characteristics: (1) no individual 
is infected during the study period by the same serotype more than once; (2) no 
individual is infected more than once during a 90-day period. Note that, as DENV 
titre responses to non-DENV flaviviruses, such as Zika and Japanese encephalitis, 
are likely to be smaller those to DENV infections, such exposures are unlikely to 
be detected by our model and incorporated as a measurement uncertainty instead. 

For all detected (symptomatic) infections, we only detect the date of symptom 
onset and not the date of infection. To obtain the day of infection for sympto- 
matic cases, we subtract a fixed period of seven days from the day of symptom 
onset, representing the median incubation period for dengue’. Titres may also 
not rise on the day of symptom onset (due to recall bias in when symptoms started 
or individual level variability). For symptomatic infections, we approximate the 
true, unobserved day of titre increase using augmentation, for which we define 
consistent augmented data for which the day of titre increase is within ten days of 
the reported date of symptom onset. For augmented (undetected) infections, we 
assume that the day of titre rise following infection always occurs 11 days after the 
day of infection, which represents an approximate estimate of the time between 
infection and day of titre increase: calculated as the sum of the median incubation 
period for dengue (seven days) and the median time between symptom onset and 
titre increase for the detected infections (four days). 

This cohort used a rolling recruitment approach, which maintained an approxi- 
mately constant-sized population and constitutes an important strength compared 
to cohorts for which the size may be strongly affected by participant dropout. As 
individuals only contributed to the likelihood for their period of inclusion in the 
cohort and dropout is not expected to depend on the history of infection, we do not 
expect that the turnover of participants in the cohort will bias parameter estimates. 
This was demonstrated in a simulation study in which we were able to recover the 
true parameters for a simulated cohort with a similar design (see “Evaluation of 
the model using simulated data). 

We use a log-normal distribution with a log-mean of 0 and log-variance of 
1 for the parameters: mean and variance in the permanent rise in log, titres 
(Wm, Wy), mean and variance in the temporary rise in log, titres (Ym, yw), mean and 
variance in the decay in log, titres per day (6m, dy), difference in rise for infecting 
compared to non-infecting serotype (primary infection only) (7), measurement 
error (o), DENV2-4 bias (2, x3, 4), daily infection strength in 1998 per serotype 
(A), relative infection strengths versus 1998 for 1997 (4) and 1999-2002 (3-5) 
and the two seasonality parameters (6 and ¢). 

Estimation using MCMC. We develop a MCMC approach to explore the joint 
posterior distribution of parameters and the augmented data with the following 
steps: 

(1) Metropolis—Hastings update for the model parameters @ in turn with the 
updates performed on a logarithmic scale. The step size of the proposals was 
adjusted to obtain an acceptance probability of 20-30%. As the vast majority of 
infections are undetected, when updating the six parameters that determine the 
rise and decay of antibodies (namely wm, Wy dm» Oy Ym» Vw), We Calculate the like- 
lihood using only the titres from one month before and one year after the symp- 
tomatic (and therefore detected) infections. This approach assumes that the rise 
and fall in titres from all infections come from the same distributions, irrespective 
of symptom status. More work is needed to understand whether, depending on 
whether or not an infection leads to symptoms, the titre dynamics following that 
infection change. 

(2) For the symptomatic cases, because the day of titre increase may not fall 
exactly on the recorded day of symptom onset, we use an independence sampler 
to update the day of titre increase. At each iteration, the day of the titre increase 
was updated for 100 randomly chosen symptomatic infections. Candidate values 
were chosen using a uniform distribution between 10 days before and 10 days after 
the recorded date of symptom onset. 

(3) Independence sampler for the identity of the infecting serotype for the 
62 symptomatic infections for which the serotype was not identified. At each 
iteration, the serotype for each of these infections is updated with equal probability 
across the four serotypes. 

(4) Independence sampler for the identity of the infecting serotype for the 
undetected infections. At each iteration, the serotype of 500 randomly chosen 
undetected infections is updated with equal probability across the four serotypes. 

(5) Independence sampler for the dates of titre increase for undetected infec- 
tions. At each iteration, the day of infection is updated for 1,000 randomly chosen 
undetected infections. For each infection, the proposal is a uniform distribution 
between one year before entry into the study and the day of the final blood draw. 

(6) Independence sampler for the baseline titres for each individual. At each 
iteration, the baseline titre for one serotype is updated for 1,000 randomly chosen 
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individuals. The proposal distribution is a random uniform distribution between 
0 and 10. All individuals that are naive at baseline (that is, those with no titres to 
any serotype at the first blood draw) are forced to have a baseline titre to 0 for all 
four serotypes. 

(7) Reversible-jump MCMC to add/remove unobserved infection events. As 
H(t) is unobserved, we use a Bayesian data augmentation approach that treats it 
as a nuisance parameter. Rather than attempting to definitively identify whether 
an infection occurred or not, these approaches allow us to incorporate the uncer- 
tainty of the presence and timing of these events. We use a reversible-jump MCMC 
model to add and remove infection events. Each step to add undetected infections 
proceeds as follows: 

(i) Randomly draw individual. 

(ii) Draw a candidate date for the infection event using a uniform distribution 
from one year before their first blood draw to the day of their final blood draw. 

(iii) Draw a candidate serotype of infection with the probability of each serotype 
being 0.25. 

(iv) Update the number, date and serotype of infections for that individual. 

For the removal of undetected infections, we use a similar approach: 

(i) Randomly draw individual. 

(ii) If that individual has undetected infections, randomly select one of their 
infections with equal probability (if they have no infections move to the next 
individual). 

(iii) Update the number, date and serotype of infections for that individual by 
removing that infection. 

Evaluation of the model using simulated data. In order to evaluate the ability of the 
model to accurately estimate the parameters in a scenario when only a minority 
of infections are observed, we use the same modelling framework on a random 
subset of 1,000 individuals from the study with subsequent changes in titres. We 
include the actual start date and the end date for these individuals (that is, when 
they entered and left the cohort). We simulate infections in these individuals based 
on known parameters. We then randomly ‘unobserve’ 70% of infections to reflect 
undetected infections. We then estimate the parameters using our framework and 
compare them to the underlying true parameters. 

Sensitivity analysis using school-specific infection strength parameters. The infection 
strength exerted on individuals may differ across schools, resulting in non- 
independence between individuals attending the same school. To assess the impact 
of any such correlation on our parameters, we performed a sensitivity analysis in 
which we included a separate force of infection parameter for each school. In this 
model the force of infection exerted on an individual that attends school (sch) is: 


| 


where A is a parameter that represents the mean daily force of infection in 1998 
in school 1, 3 438 the mean force of infection in year |t| as compared to that in 1998 
and G isch #8 the mean force of infection for school sch compared to school 1. 

Alternative functional forms for the decay in titres. Alternative functional forms for 
the decay in antibody titres exist. In particular, biphasic models that model both 
short-term and longer-term antibody decay with different exponential decay rates 
have been shown to work well in other systems, such as malaria*®. The biphiasic 


form is captured by: 


A(t, sch) = X BP scr) f + Beos| C+ ue 


Titre, = 0,(0, exp(—6,t) + (1—0,)exp(—4,t)) 


where 6), 02, 03 and 04 capture the decay of the titres. To explore whether this 
biphasic form may further refine how antibodies behave following infection, we 
fitted both exponential decay and biphasic models to the observed infections using 
the observed titres following detected PCR-confirmed infections and the dates of 
symptom onset. We found largely consistent results in the two models (Extended 
Data Fig. 1). As exponential decay is the more parsimonious model, we retained 
this form for the final analysis. Nevertheless, structural uncertainty in the model 
used for the analysis remains, which will not be represented within the confidence 
intervals for the parameters. 

Estimation of impact on titres on infection and disease. We use the augmented times 
and serotypes of infection from 100 model iterations to reconstruct the antibody 
titre trajectories for each individual. For each augmented dataset, we extract the 
mean titre across all four serotypes for each day and whether they got infected the 
following day or not. Person time for individuals who were considered not sus- 
ceptible (that is, had been infected in the prior 90 days) was excluded. To explore 
the relationship between mean titre and the probability of infection, we conducted 
logistic regression for which a polynomial spline of order 2 for the mean titre was 
used (determined as the optimal model through comparison of different poly- 
nomial models using the Akaike information criterion (AIC)). To account for 
sampling uncertainty, in each reconstructed dataset, we use a bootstrap approach to 
sample all individuals with replacement and then re-perform the logistic regression 


each time. We present the mean and 95% confidence intervals from the resulting 
distribution of the logistic model estimates of the probability of infection for each 
titre obtained from across the model iterations. 

We explore the relationship between mean titre and the probability of having 
different disease outcomes. We consider three different outcomes: symptomatic 
infection (irrespective of severity), hospitalization and DHF. We only consider 
titres during the active surveillance windows and whether or not individuals had 
an infection the following day that led to the outcome of interest. For each outcome, 
we conduct logistic regression in which we use a polynomial spline of order 2 for 
the mean titre (consistently determined as the optimal model through comparison 
of different polynomial models using AIC). We use a bootstrap approach to sample 
all individuals with replacement and then re-perform the logistic regression each 
time and identified the mean and 95% confidence intervals from the resultant 
distribution for the estimates of the probability of having an infection that led to 
the outcome of interest for each titre obtained from across the model iterations. 

For those individuals that became infected during the active surveillance 
windows, we fit logistic models to the mean titres and whether or not the disease 
outcome occurred. We looked at three outcomes: any symptomatic illness, hos- 
pitalization and DHE. For each of the three outcomes, we compare an intercept- 
only model with models with a polynomial spline up to order 2. To account for 
sampling uncertainty, in each reconstructed dataset, we use a bootstrap approach to 
sample all individuals who had an infection during the surveillance windows with 
replacement and then re-perform the logistic regression each time. We present the 
mean and 95% confidence intervals from the resultant distribution of the logistic 
model estimates of the probability of infection for each titre obtained from across 
the model iterations. 

PRNT titres are available for a subset of 1,771 blood draws. For those that 
became infected during the active surveillance windows and PRNT titres are 
available in the six-month window before infection, we fit logistic models to these 
mean PRNT titres from that six-month time frame and whether or not the dis- 
ease outcome occurred. We looked at three outcomes: any symptomatic illness, 
hospitalization and DHE. For each of the three outcomes of interest, we compare 
an intercept only model with models with a polynomial spline up to order 2. To 
account for sampling uncertainty, in each reconstructed dataset, we use a bootstrap 
approach to sample all individuals who had an infection during the surveillance 
windows with replacement and then re-perform the logistic regression each time. 
To account for the fact that individuals and serum samples may not have been 
completely selected at random for PRNT testing (for example, preferential testing 
of those with symptomatic disease), we adjusted our estimate for the probability 
of sampling conditional on the outcome of interest. 

From the logistic regression described above, we can extract the probability 
of the outcome of interest given a particular PRNT titre and that a PRNT was 
conducted. Using Bayes rule we can write down: 


P(outcome|titre, PRNT done) 


__ P(PRNT doneloutcome, titre) P(outcome|titre) 
P(PRNT done|titre) 


as the PRNT titre (or the haemagglutination inhibition titre) was not taken into 
account in the selection process for choosing whether or not a PRNT was done, 
this becomes: 


P(PRNT done|outcome)P(outcome|titre) 


titre, PRNT done) = 
P(PRNT done) 


P(outcome 


As we are interested in P(PRNT done | outcome), we can reorder this equation to: 


P(outcome|titre, PRNT done)P(PRNT done) 


P(PRNT done 
P(outcome|titre) 


outcome) = 


We therefore multiply our logistic model outcomes by the following adjustment 
factor: 


P(PRNT done) 
P(PRNT doneloutcome) 


adjustment factor = 


P(PRNT done) is calculated as the proportion of all infection events for which 
a PRNT was conducted in the prior 6 months from the infection and P(PRNT 
done | outcome) is calculated as the proportion with the outcome of interest for 
which PRNTs were conducted in the prior 6 months. We present the mean and 
95% confidence intervals from the resultant distribution of the logistic model 
estimates of the probability of infection for each titre obtained from across the 
model iterations. 
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We used a logistic regression approach to explore the impact of year of infec- 
tion and the age at the time of infection. To explore the impact of year, we take 
each augmented dataset in turn and sample all the individuals with replacement 
to incorporate sampling uncertainty. We then regress the year of infection (as a 
categorical variable) on whether the outcome Y;; occurred: 


logit (¥; ,) = 8) + 8, x Year; , 


where Year;,; is the year (1998, 1999, 2000, 2001 or 2002) within which day t 
occurred for individual i. We conducted separate regressions for which the outcome 
was an infection event (irrespective of whether the infection led to symptoms), 
symptomatic infection events (irrespective of disease severity), hospitalization and 
development of DHE For the last three models, we only considered data during the 
active surveillance windows, as we do not know the symptom status of infections 
outside these windows. To explore the impact of age, we dichotomized the age of 
individuals as being less than or greater than 9 (the Sanofi-Pasteur vaccine is not 
recommended for individuals under 9). We then performed the regression: 


logit (Y;,.) = 6, + GB, x Age, , 


where separate models for the same four outcomes, Yj, were performed. Finally, 
we built multivariable models that also accounted for mean titre using a polyno- 
mial of order 2: 


logit(Y; ,.) = G+ GB, x Age, + 8, x Titre;, + 8, x Titre?, 


Impact of titre on outcome using Cox proportional hazard models. In the context 
of small probabilities of an event occurring and short time intervals between 
readings, logistic regression will give consistent results with those obtained from 
Cox proportional hazards models that specifically takes the non-independence 
of titre observations from the same individuals into account!”. To demonstrate 
the consistency of the two approaches we estimate the impact of titre on our four 
outcomes (infection, symptomatic infection, hospitalized infection and DHF 
infection) using a time-varying Cox proportional hazards model, specifically 
incorporating clustering of observations by individual*’”. We used 100 aug- 
mented datasets. For each augmented dataset, we extract the mean titre across 
all four serotypes for each day and whether they got the outcome of interest the 
following day or not. For the disease-specific outcomes (any symptomatic disease, 
hospitalized infection and DHF infection), we only used time points during the 
surveillance windows. We then calculated the impact of the mean titre (polyno- 
mial of order 2) on the relative hazard of infection, incorporating a clustering 
ID per individual using the survival package in R°”. We then calculate the mean 
effect of titre on the outcome of interest by averaging the estimates across the 
reconstructed datasets. 

To compare our results using logistic regression, we multiply the annualized 
estimate of a titre x on the risk of the outcome (calculated as 1 — exp(—365x)) by 
the estimated baseline hazard for those cases with a measured titre of 0 (calculated 
as the proportion of infections in time points with a measured titre of 0). We find 
that the results are almost identical (Extended Data Fig. 6). As the logistic model 
approaches allow us to directly estimate the underlying probability of the outcome, 
it is preferred. 

Survival analysis. Annualized probability of infection using titre data only. Over 
100 reconstructed datasets, we initially identify all individuals who experienced 
an infection (irrespective of disease severity). We then identify the set-point 
antibody load for that infection as the mean titre one year following infection 
as predicted by our model. Individuals were divided into two groups, those with 
a set-point antibody load <3 and those with a load >3. For each individual in 
each titre group, we use the logistic model described in ‘Estimation of impact on 
titres on infection and disease’ to predict the daily probability of a subsequent 
infection based on the mean titres each day following the initial infection. We 
also calculated the daily probability of experiencing an infection that leads to 
DHE. We annualize the predicted probabilities of subsequent infection by using 
the conversion 1 — exp(—365x) where x is the daily probability of infection. We 
present the mean annualized probabilities across all individuals and over all the 
reconstructed datasets. 

Kaplan-Meier analysis. For individuals who experienced an infection, we calculate 
Kaplan-Meier survival curves for experiencing a subsequent infection (both irre- 
spective of disease outcome and for DHF only). Over 100 reconstructed datasets, 
we identify all individuals who experienced an infection event. We then identify 
the set-point antibody load for that infection as the mean titre one year following 
infection as predicted by our model. Individuals were divided into two groups, 
those with a set-point antibody load <3 and those with a load >3. To incorporate 
sampling uncertainty, we resample all individuals with replacement. For each 
group, we then calculate Kaplan-Meier survival curves. We present the mean and 
2.5 and 97.5 quantiles from the resulting distribution. 
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Prediction of DHF outcome using mean titre. We assess the ability of our logistic 
model to discriminate between those who developed DHF and those who did not 
using a leave-one-out cross-validation method. 

For each reconstructed dataset, taking each DHF case in turn, we initially iden- 
tified all individuals who were in the cohort at the same time as the DHF infection 
with detectable titres who themselves did not have a DHF infection within a 1-year 
period. We then randomly selected one of those individuals and used the titre from 
that day. Once we had selected a matched control for each DHF case, we calculated 
the receiver operating characteristic (ROC) using leave-one-out cross-validation. 
To do this, we removed each individual in turn from the dataset (including both the 
cases and the controls) and recalculated the relationship between mean haemagglu- 
tination inhibition titre and DHF infection using all the remaining titre readings. 
We then predicted the probability that the held-out case had a DHF infection. The 
ROC was calculated using these probabilities across individuals. We present the 
mean ROC from across 100 reconstructed datasets. 

Clustering of infections by school. For additional model validation, we explore 
whether augmented infections occurred in the same schools around the same 
time as observed cases, despite no information on location being provided to 
the model. 

Clustering of subclinical infections within schools. To explore the clustering of sub- 
clinical data with symptomatic infections in schools, we use the tau-clustering 
statistic?) to calculate the odds of observing an subclinical infection (irrespective 
of serotype and infection parity) within a set time period (t;, t) of a symptomatic 
infection within the same school relative to the odds of observing a subclinical 
infection in a different school within the same time window. 


j tt (t), ty) 

#(t,t) =—22 

(00) 

where: 
N, Nagy; 
. eee joey? U(schy = ]t< Isil <t,) 
TA) = ig Noes h 

yore thes PI (sc j= Ot < Isil <t,) 


where Ngymp and Nasymp are the number of symptomatic and subclinical infections 
within any model iteration, I is an indicator variable, schj is equal to 1 if individuals 
iand j go to the same school and 0 otherwise, sj is the time between infections. We 
varied the time window between 0-90, 90-180 days and greater than 180 days. 
Clustering of serotypes within schools. We explore whether the augmented serotypes 
that were assigned to subclinical primary infections (serotypes could not reliably 
be assigned in subsequent infections because of cross-reactions) were consistent 
with the serotypes of the symptomatic infections of individuals within the same 
school for different periods of time. 

For augmented primary infections that are consistently of the same serotype 
(defined as >50% of augmented datasets having a primary infection in the same 
individual caused by the same serotype in the same six-month time window), we 
calculated the odds that an augmented primary infection that occurs in the same 
school and within a fixed time window of a PCR-confirmed case is of the same 
serotype relative to the odds that an augmented primary infection that occurs 
within the same time window in a different school is of the same serotype. 


F Thy (ty t) 
A(t b) = 
3 (ty, t) 
where: 
N, N, 
ete jae jor? U(schy =Lt< Isil <bhy sery= 1) 
2°12? °2. N. N, 
pit in I(schj = 1, < |syl <t,, serj=0) 
N, N, 
ee yore jay’ U(schy =0;1,< Isil <ty serg= 1) 
3 Pp 2 N. Nosy 
perhes Onis I (sch =0,t,< Isil <ty serg= 0) 


where ser, is equal to 1 ifi and j go to the same school and 0 otherwise. We varied 
the time window between 0-90 days, 90-180 days and greater than 180 days. 
Uncertainty. To incorporate sampling uncertainty into our estimates, for each 
model iteration, we randomly selected all infection events with replacement before 
calculating the tau estimates. The 95% confidence intervals were calculated from 
the 2.5% and the 97.5% quantiles of the resulting distribution across all model 
iterations. 

Different approaches to identify infections using simple cutoff points. To assess 
the sensitivity and specificity of the current approach to identify infections based 
on titre differences across two blood draws, we simulated titre trajectories in which 
infections did and did not take place. 
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Simulated titres in which infections did take place. We used the following algorithm: 

(1) Randomly draw MCMC iteration. 

(2) Randomly divide the population of individuals who had at least one 
infection in two: ‘model fit’ individuals and ‘held out’ individuals. 

(3) Of the model fit individuals, randomly draw an individual i. 

(4) Identify the parameters for the antibody dynamics for the first infection for 
that individual (that is, ¢;,-=1, Ji. =1 Wir=1) and the baseline titre Ai,s(to) from 
that MCMC iteration. The true titre for each serotype will be Ais(to). 

(5) Calculate the measured titre for each serotype using a random draw from a 
normal distribution with mean A;,(fo) and standard deviation o, where o repre- 
sents the measurement error of the assay. Under scenarios of a discrete assay, the 
measured titre is rounded down to the nearest integer. 

(6) Draw an infection time point using a uniform distribution between 0 and 
tmax» Where tmax represents the time of the second blood draw. 

(7) Calculate the true titre at tmax for each serotype (Ajs(tmax))- 

(8) Calculate the measured titre using a random draw from a normal distribu- 
tion with mean A;;(tmax) and standard deviation o. Under scenarios of a discrete 
assay, the measured titre is rounded down to the nearest integer. 

Simulated titres where infections did not take place. We used the following 
algorithm: 

(1) Randomly draw MCMC iteration. 

(2) Randomly divide the population of individuals who had at least one 
infection in two: ‘model fit’ individuals and ‘held out’ individuals. 

(3) Of the model fit individuals, randomly draw an individual i. 

(4) Identify the baseline titre A;,(fo) from that MCMC iteration. The true titre 
for each serotype will be Aj.(to). 

(5) Calculate the measured titre for each serotype using a random draw from 
a normal distribution with mean A;,(fo) and standard deviation o, where o 
represents the measurement error of the assay. Under scenarios of a discrete assay, 
the measured titre is rounded down to the nearest integer. 

(6) Calculate a second measured titre using a random draw from a normal dis- 
tribution with mean A;,(to) and standard deviation o. Under scenarios of a discrete 
assay, the measured titre is rounded down to the nearest integer. 

Different assays. The current approach is to see whether there is a fourfold rise 
between blood draws in any of the four serotypes using the discrete haemagglu- 
tination inhibition assay. 

The ‘meam approach is to first calculate the mean across the four serotypes at 
each time point and then compare the mean titres across two time points to identify 
whether infections have occurred or not. 

Some assays give titres on a continuous scale (and not discretized like the 
haemagglutination inhibition assay). In the ‘continuous assay’ approach, as with 
the mean approach, we initially calculate the mean titre across the four serotypes 
at each time point and then compare the mean titres across two time points to 
identify whether infections have occurred or not. 

Assessment of the different assays across time between blood draws and error in assay. 
Using the simulation approaches set out above we obtained 10,000 individuals with 
pairs of measured titres (with one titre for each serotype) for whom an infection 
did take place in between the titre measurements and a further 10,000 individuals 
with pairs of measurements for whom no infection took place. We varied the time 
between blood draws (tmax) between 10 days and 400 days and the error in the assay 
(co) between 0.1 and 1. For each resultant dataset, we used the held-out dataset (that 
is, those individuals not included in the model fitting) to calculate the sensitivity 
and specificity under each of the different measurement approaches. Each time, we 
also identified the cutoff point that maximized the sensitivity while maintaining at 


least 95% specificity. We performed a separate analysis in which we identify cutoff 
points to maximize sensitivity while maintaining 99% specificity. 

Comparison between PRNT and haemagglutination inhibition titres. For 1,771 
blood draws, both PRNTs and haemagglutination inhibition tests were conducted. 
We compared the mean PRNT log; titre across the four serotypes with the mean 
haemagglutination inhibition log titre from the four serotypes and fitted a line 
through the two using linear regression. We compared different polynomial models 
up to order 2 and used the best fitting one as determined by AIC. 

Comparison with Sanofi-Pasteur vaccine titres. To explore the potential impact 
of the Sanofi vaccine, we extracted the geometric mean PRNT titres following vac- 
cination for both seronegative and seropositive individuals who were vaccinated 
in Latin America”!. The extracted values for PRNT titre, 28 days after the second 
injection (see Supplementary Table 8 in Villar et al.?) are shown in Supplementary 
Table 8. 

The values 28 days after the third injection are also available and are 81 for 
those who were seronegative before vaccination and 658 for those who were 
seropositive before vaccination”!. We plot these values on a plot of the rela- 
tionship between haemagglutination inhibition titre and PRNT titre from our 
assays (Fig. 4d). 

Ethical approval. The cohort protocol was approved by the Institutional 
Review Boards of the Thai Ministry of Public Health, the Office of the US 
Army Surgeon General and the University of Massachusetts Medical School. 
Informed consent was obtained from participants and their parents/guardians. 
No personal identifiable information was available to the researchers for the pre- 
sented analysis. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Code availability. C++ code is available from the corresponding author upon 
request. 

Data availability. De-identified data used in this project are available as part of 
this manuscript. All date information was removed in order to to de-identify the 
dataset. Individuals interested in accessing a full dataset with identifying infor- 
mation should contact the corresponding author to obtain the necessary IRB 
approval. 
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Extended Data Fig. 1 | Comparison of biphasic versus exponential 
decay. Biphasic and exponential decay curves fitted to haemagglutination 
inhibition antibody measurements following observed symptomatic 
infections. 
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Extended Data Fig. 2 | Variability in titre responses and measurement 
error and bias by serotype. a, Variability in titre responses. Violin plots 
showing median (black square), 25% and 75% quantiles (thick black line) 
and 95% distribution (in grey) of net titre increases at different time points 
after infection (n = 1,420). b, Estimated underlying differences across 
serotypes in the measurement of antibody levels by haemagglutination 
inhibition assay over that attributable to infection (DENV1 is reference 


(Ref.)) with 95% credible intervals (fitted to data from 140,612 titre 
measurements). c, Mean estimated error in the haemagglutination 
inhibition assay estimated with 95% credible intervals using our model 
results (grey) and empirically derived (blue) results from 795 repeated 
measurements on the same serum compared to the values from previously 
empirically derived estimates!* for PRNTs (blue). 
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Extended Data Fig. 3 | Serotype distributions. a, Distribution of 
serotypes by year comparing the detected symptomatic infections by PCR 
and the augmented primary infections for which we could confidently 
assign the serotype (>50% of model iterations inferring the same 
serotype). We could confidently assign the serotype in 60% of cases. 

b, Serotype distribution for detected symptomatic primary infections and 
augmented subclinical primary infections for which the infecting serotype 
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could be confidently assigned (>50% of model iterations inferring 

the same serotype). c, Distribution of serotypes by year comparing the 
detected symptomatic infections by PCR and the augmented primary 
infections using a more stringent cutoff that >75% of model iterations 
infer the same serotype. In this scenario, we could confidently assign the 
serotype in 32% of instances. 
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Extended Data Fig. 4 | Cox proportional hazards model versus logistic 
regression. Comparison of results using time varying Cox proportional 
hazards model (dashed line) with that from logistic regression (solid line) 
for the annualized probability of infection (a), developing any symptoms 
(b), being hospitalized (c) and developing DHF (d) as a function of the 
mean measured antibody titre across all serotypes at the time of exposure 
using titre data from all study subjects (n = 3,451). The open circles on 
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the left represent primary infections (that is, those with no detectable 
titres for any serotype before exposure). The shaded regions represent 
95% bootstrap confidence intervals. To calculate probabilities, the relative 
hazards from the Cox model are multiplied by the baseline hazard for 
those with measured titres of 0 (calculated as proportion of person time 
with an infection time among those with measured titres of 0). 
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Extended Data Fig. 5 | ROC for the identification of DHF infections. 
Ability of modelled relationship between measured haemagglutination 
inhibition titre and risk of DHF to identify those with DHF, using those 
with DHF compared to randomly selected matched controls from 
individuals in the cohort who had detectable titres at the same time 
(n = 36 with DHF with the same number of matched controls). AUC, area 
under the curve. 
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Extended Data Fig. 6 | Probability of disease as a function of (n= 781). d, For those infected during the surveillance windows (n = 781), 
haemagglutination inhibition and PRNT titre. Probability of disease as the probability of developing any symptoms as a function of mean PRNT 
a function of mean titre across the four serotypes at the time of infection. titre. e, For those infected, the probability of being hospitalized as a 
a, For those infected during the surveillance windows, the probability function of mean PRNT titre. f, For those infected, the probability of 
of developing any symptoms as a function of mean titre (n= 781). developing DHF as a function of mean PRNT titre. In each panel, the 
b, For those infected during the surveillance windows, the probability of open circles on the left represent primary infections. The shaded region 
being hospitalized (n = 781). c, For those infected during the surveillance represents 95% confidence intervals. 


windows, the probability of developing DHF as a function of mean titre 
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Extended Data Fig. 7 | Population-level distribution of titres by birth c, d, Proportion of each cohort with titres above risk zone (that is, greater 
cohort and age. a, Proportion of each cohort who are naive as a function than 3) as a function of time (c) and age (d). 
of time. b, Proportion of each cohort who are naive as a function of age. 
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Extended Data Fig. 8 | ROC for infection detection using different Infections are considered to occur when the ratio of any of the four titres 
testing protocols. The ROC for different assay approaches and time at time point 2 versus time point 1 is greater than the threshold value. 
between blood draws calculated from 100,000 simulated titre responses. c, Haemagglutination inhibition tests conducted against all four serotypes. 
a, Single serotype assay—when haemagglutination inhibition tests Infections are considered to occur when the ratio of the mean of the four 
are conducted for only a single serotype at two time points. titres at time point 2 versus the mean at time point 1 is greater than the 


b, Haemagglutination inhibition tests conducted against all four serotypes. _ threshold value. 
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Extended Data Fig. 9 | Performance of assay is dependent on time 


between blood draws and measurement error. Optimization of assays 

for the detection of events in which the specificity is maintained at >95%. 
a-c, We explore the performance of three different assay testing protocols: 
current practice for which infection events are defined as a rise above a 
cutoff point in any serotype across two blood draws (a), a ‘mean approaclY 
for which the mean across all serotypes is first calculated before comparing 
the means across time points (b), a ‘mean approacly for which titres are 
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available on a continuous scale (c). For each protocol, we identify the 
optimal cutoff point for a range of assay measurement errors from 100,000 
simulated titres based on the fitted titre responses from infections in our 
study population, that maintains a specificity of >95% (top row). We 

then calculate the sensitivity of the approach for different time intervals 
between blood draws using 50% held-out data (bottom row). d-f, Same as 


LETTER 


A Time only B Time and serotype 
2.0 4.0 
207 = 
acu ees 
1 0 age ee ae 1 ne) wee ee ew = ——_ - 
a es | a oe | 
<3m 3-6m >6m <3m 3-6m >6m 
Time (days) Time (days) 


Extended Data Fig. 10 | Clustering of symptomatic (n = 274) and 
subclinical cases (mean n= 507 across 100 reconstructed datasets) by 
school by time and serotype. a, Probability of observing an augmented 
subclinical infection (irrespective of serotype) occurs at different time 
intervals within the same school of a detected symptomatic case relative to 
the probability of observing an augmented subclinical infection occurring 
in a different school in that same time interval. b, For augmented primary 
infections that are consistently of the same serotype (defined as >50% 

of augmented datasets having a primary infection in the same individual 
caused by the same serotype in the same six-month time window). 
Probability that an augmented primary infection that occurs within a 
fixed time window of a PCR-confirmed case and in the same school is of 
the same serotype relative to the probability that an augmented primary 
infection that occurs within the same time window in a different school 

is of the same serotype. Note that the modelling framework can only 
allow differentiation of serotypes for primary infections. Cross-reaction 
prevents differentiation in subsequent infections. Overall, 60% of primary 
infections have a consistent serotype for a primary infection across 
augmented datasets. Each box plot presents the 2.5%, 25%, 75% and the 
97.5% quantiles of the distribution as well as the mean. 
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Microglial control of astrocytes in response to 


microbial metabolites 
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Tradite Neziraj', Matilde Borio!, Michael Wheeler!, Loic Lionel Dragin*, David A. Laplaud®, Jack Antel’, Jorge Ivan Alvarez’, 


Marco Prinz?° & Francisco J. Quintanab”* 


Microglia and astrocytes modulate inflammation and 
neurodegeneration in the central nervous system (CNS)*~?. 
Microglia modulate pro-inflammatory and neurotoxic activities 
in astrocytes, but the mechanisms involved are not completely 
understood*». Here we report that TGFa and VEGF-B produced 
by microglia regulate the pathogenic activities of astrocytes in 
the experimental autoimmune encephalomyelitis (EAE) mouse 
model of multiple sclerosis. Microglia-derived TGFa acts via the 
ErbB1 receptor in astrocytes to limit their pathogenic activities and 
EAE development. Conversely, microglial VEGF-B triggers FLT-1 
signalling in astrocytes and worsens EAE. VEGF-B and TGFa 
also participate in the microglial control of human astrocytes. 
Furthermore, expression of TGFa and VEGF-B in CD14* 
cells correlates with the multiple sclerosis lesion stage. Finally, 
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metabolites of dietary tryptophan produced by the commensal 
flora control microglial activation and TGFa and VEGF-B 
production, modulating the transcriptional program of astrocytes 
and CNS inflammation through a mechanism mediated by the 
aryl hydrocarbon receptor. In summary, we identified positive and 
negative regulators that mediate the microglial control of astrocytes. 
Moreover, these findings define a pathway through which microbial 
metabolites limit pathogenic activities of microglia and astrocytes, 
and suppress CNS inflammation. This pathway may guide new 
therapies for multiple sclerosis and other neurological disorders. 
Microglia are reported to express aryl hydrocarbon receptor 
(AHR)®”. To investigate the role of microglial AHR on CNS inflam- 
mation, we generated Cx3cr1 creERT2 A hyJVfl mice (CK3CR1-AHR mice) 
in which the Cx3cr1 promoter drives the expression of Cre recombinase 


Fig. 1 | AHR limits microglial pro-inflammatory 

poo  tanscriptional responses during EAE. a, EAE 
0 | clinical scores in control and CX3CR1-AHR mice 
(n= 10 mice per group). Data are representative of 
two independent experiments. b, Representative 
spinal cord sections from control and CX3CR1- 
AHR EAE mice stained for Luxol Fast blue (LFB) 
and periodic acid-Schiff (PAS) for demyelination 
—* (top), and MAC3 for macrophage infiltration 
(bottom). Representative of three sections from 
three mice. Right, quantification of demyelination 
and macrophage infiltration. c, Pro-inflammatory 
monocytes in the CNS of control and 
CX3CRI1-AHR EAE mice. Data are representative 
of two independent experiments with n=5 
mice per group. d, Microglial mRNA expression 
determined by quantitative PCR (qPCR) in control 
(n=8) and CX3CR1-AHR (n= 8) EAE mice. 
e, Left, Lys310-acetyl p65 in IBA-1-positive cells 
in control and CX3CR1-AHR EAE mice. Right, 
quantification of IBA-1/p65 double-positive 
cells. Data are representative of two independent 
experiments with n= 9 mice per group. f, Heat 
map of 9,957 genes expressed in microglia from 
control and CX3CR1-AHR mice (n= 3 mice per 
group). Gene expression is row-centred and 
logs-transformed, and saturated at levels —0.5 and 
+0.5 for visualization satisfying a false discovery 
rate (FDR) <0.1. g, Microglial mRNA expression 
determined by qPCR in control (n=5) and 
CX3CR1-AHR (n=5) EAE mice. Data in a-d, g 
are mean + s.e.m. P values were determined by 
two-way analysis of variance (ANOVA) (a) or 
two-sided Student's t-test (b-e, g). 
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Fig. 2 | AHR-regulated microglial TGFa and VEGF-B control 
astrocytes during EAE. a, Heat map of 14,823 genes (detected at 0.1 level 
in at least two of three samples) expressed in astrocytes from control and 
CX3CRI1-AHR mice. Gene expression is row-centred log,-transformed 
and saturated at —0.5 and +0.5 levels for visualization satisfying an 

FDR < 0.1. n=3 independent biological samples per group. b, mRNA 
expression determined by qPCR in control and CX3CR1-AHR EAE 
mice (n=5 per group). c, Network diagram of differentially regulated 
genes in astrocytes and their predicted upstream regulators in microglia 
(n=3 independent samples per group). d, Microglial mRNA expression 
determined by qPCR in control and CX3CR1-AHR EAE mice. Data are 
representative of two independent experiments with n =6 control and 


fused to an oestrogen ligand-binding domain®. After treatment of 
CX3CR1-AHR mice with tamoxifen, AHR-expressing peripheral 
CX3CRI* cells are replenished from the bone marrow while micro- 
glia remain AHR deficient without impaired survival (Extended 
Data Fig. la-d). Microglial AHR deletion worsened EAE, increasing 
demyelination and CNS monocyte recruitment (Fig. la—c); the T-cell 
response was unaffected (Extended Data Fig. le, f). AHR deletion in 
peripheral CX3CRI1* cells achieved by chronic tamoxifen adminis- 
tration to bone marrow chimaeras of wild-type mice reconstituted 
with CX3CR1-AHR bone marrow’ led to earlier EAE onset, without 
affecting maximal scores and disease recovery (Extended Data Fig. 1g). 
AHR deficiency in both CNS-resident and peripheral CX3CR1* cells 
accelerated EAE onset and impaired recovery (Extended Data Fig. 1h). 
Collectively, these data suggest that microglial AHR limits EAE. 

NF-kB controls microglial responses during EAE’, and AHR can 
limit NF-«B activation in a SOCS2-dependent manner'™"!. The deletion 
of microglial AHR decreased Socs2 expression and increased NF-«B 
p65 nuclear localization in spinal cord Iba-1* myeloid cells during EAE 
(Fig. 1d, e). Moreover, AHR deletion led to the upregulation of tran- 
scripts associated with microglial activation (Apoe, Ddit4 and B2m), 
inflammation and neurodegeneration (Ccl2, Nos2, Il1b and I 123a)*4 
(Fig. 1f, g). 
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n=8 (Vegfb) or n=6 (Tgfa) CX3CR1-AHR mice per group. 

e, Effect of MCM and blocking antibodies to TGFa and VEGF-B on gene 
expression in primary astrocytes determined by qPCR after 24h. Data 
are representative of three independent experiments with three biological 
replicates. f, g, EAE in C57B1/6J mice injected with lentiviral knockdown 
constructs targeting Tgfa, Vegfb or control in microglia (f), or Erbb1 (also 
known as Egfr), Flt1 or control in astrocytes (g). Data are representative of 
two independent experiments with n =5 mice per group. Data in b, e-g 
are mean + s.e.m. P values were determined by two-sided Student's t-test 
(b, d), one-way ANOVA followed by Tukey’s post-hoc test (e) or two-way 
ANOVA (f, g). NS, not significant. 


Microglia modulate astrocyte phenotype and function!. Indeed, 
microglial AHR deletion upregulated the expression of genes in astro- 
cytes associated with inflammation and neurodegeneration, such 
as Ccl2, Il1b and Nos2 (Fig. 2a, b). Bioinformatic analyses aimed to 
identify candidate cause and effect relationships between dysregu- 
lated transcriptional responses in microglia and astrocytes identified 
two transcriptional modules in astrocytes, potentially controlled by 
microglia-produced Tgfa and Vegfb during EAE (Extended Data 
Fig. 2a and Fig. 2c). Similar microglial Vegfb expression levels were 
detected throughout the CNS during EAE; Tgfa expression was slightly 
decreased in spinal cord microglia (Extended Data Fig. 2b). 

Microglial AHR deletion decreased Tgfa and increased Vegfb 
expression during EAE (Fig. 2d). AHR regulates gene expres- 
sion by direct interactions with target DNA regions, and also by 
controlling other transcription factors such as NF-«B!*!5. We 
identified AHR and NF-KB responsive elements (XREs and NREs, 
respectively) in the Vegfb and Tgfa promoters (Extended Data 
Fig. 2c). AHR deletion increased NF-kB p65 recruitment to NREs 
in the Vegfb promoter in microglia during EAE (Extended Data 
Fig. 2d). NF-KB p65 transactivated the Vegfb promoter in reporter 
assays; AHR suppressed this transactivation as well as Vegfb 
promoter basal activity (Extended Data Fig. 2e). AHR was also 
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Fig. 3 | Trp metabolites control microglia—astrocyte interactions and 
CNS inflammation. a, Clinical scores in control (left) and CX3CR1-AHR 
(right) mice treated with TDD, TDD plus Trp or TDD plus I3S from day 
21 after EAE induction (n = 10 mice per group). Data are representative of 
two independent experiments. b, Microglia were isolated and subjected to 
RNA sequencing (RNA-seq). Heat map of expressed genes of normalized 
reads of two independent samples per group. c, t-distributed stochastic 
neighbour embedding (tSNE) plot of RNA-seq data areolated from 
microglia of mice as in b. d, Microglial mRNA expression determined by 
qPCR in EAE mice as in a. Data are representative of two independent 
experiments with three replicates. e, mRNA expression determined by 
qPCR in microglia from EAE mice as in a. Data are representative of two 
independent experiments with three replicates. f, Heat map depicting 
mRNA expression in astrocytes from EAE mice as in a, as determined by 
RNA-seq of normalized reads of two independent samples per group. Data 
in a, d and e are mean +s.e.m. P values were derived by two-way ANOVA 
(a), one-way ANOVA followed by Tukey’s post-hoc test (d, e). 


recruited to XREs in the Tgfa promoter in microglia (Extended 
Data Fig. 2f) and transactivated the Tgfa promoter in reporter studies 
(Extended Data Fig. 2g). These findings suggest that AHR regulates 
microglial Tgfa and Vegfb expression through its direct effects on 
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the Tgfa and Vegfb promoters, and through its ability to limit NF-«B 
activation. 

We then analysed the effects of microglial TGFa and VEGF-B on 
astrocytes. Microglial AHR inhibition decreased Tgfa and increased 
Vegfb expression (Extended Data Fig. 3a), and it boosted pro-inflam- 
matory Ccl2 and neurotoxic Nos2 expression induced in astrocytes by 
microglial supernatants (Extended Data Fig. 3b-f). Antibody blockade 
showed that TGFa and VEGF-B mediated these effects with a relative 
dominance of TGFa suppressive effects (Fig. 2e). Recombinant 
TGEFa decreased pro-inflammatory chemokine (Cc/2 and Csf2) and 
cytokine (1/6) expression induced in mouse astrocytes by TNF and 
IL-1, whereas it enhanced JI10 expression (Extended Data Fig. 3g). 
Conversely, VEGF-B boosted Ccl2, Csf2 and Nos2 expression in 
astrocytes. Similarly, VEGF-B pretreatment enhanced the toxicity of 
astrocyte-conditioned medium (ACM) towards neurons and oligo- 
dendrocytes; TGFa reduced this toxicity (Extended Data Fig. 3h,i). 
VEGE-B pretreatment also enhanced pro-inflammatory monocyte 
recruitment and microglia activation by ACM; these activities were 
inhibited by TGFa (Extended Data Fig. 3j,k). These data suggest that 
TGFa and VEGE-B control astrocyte functions that contribute to CNS 
pathology. 

To investigate the interaction between microglia and astrocytes in 
vivo, we knocked down Tgfa and Vegfb expression in microglia using 
lentivirus-delivered short-hairpin RNAs (shRNAs) expressed under 
the control of the Itgam (encoding CD11b) promoter. The knockdown 
did not affect astrocyte numbers or morphology, nor Tgfa and Vegfb 
expression in CNS-infiltrating monocytes (Extended Data Fig. 4a-f). 
Microglial Tgfa knockdown worsened EAE, whereas Vegfb knock- 
down ameliorated it (Fig. 2f). Similar observations were made when 
the TGFa or VEGF-B receptors ErbB1 or FLT-1, respectively, were 
knocked down in astrocytes (Fig. 2g, Extended Data Fig. 4a, b, e, g). Of 
note, VEGF-B administration did not induce demyelination in naive 
mice (Extended Data Fig. 4h), suggesting that VEGF-B synergizes with 
other factors to boost EAE pathology. Moreover, the knockdown of 
Tgfa and Vegfb in astrocytes or of their receptors in microglia did not 
affect EAE (Extended Data Fig. 5), supporting a microglia to astrocyte 
directionality in their effects. 

Transcriptional analyses suggested that TGFa-ErbB1 and VEGF-B- 
FLT-1 regulate NF-kB in astrocytes (Extended Data Fig. 6a—d), known 
to drive their pathogenic activities during CNS inflammation!°!¢!8, 
Indeed, NF-«B signalling in astrocytes was increased after microglial 
AHR deletion (Extended Data 2a). Interestingly, VEGF-B boosted 
astrocytic NF-«B activation; this boost was inhibited by TGFa 
(Extended Data Fig. 6e). Moreover, NF-«B blockade suppressed the 
increase in pro-inflammatory gene expression induced by VEGF-B in 
astrocytes (Extended Data Fig. 6f, g). Collectively, these findings sug- 
gest that by controlling NF-kB signalling, VEGF-B and TGFa modulate 
astrocyte pathogenic activities. 

The microbial metabolism of dietary tryptophan (Trp) generates 
AHR agonists such as 13S, which limits astrocyte pathogenic activities 
and EAE development!®!*”°. To investigate the role of microglial 
AHR in the control of CNS inflammation by dietary Trp metabolites, 
we subjected control and CX3CR1-AHR mice to a Trp-depleted 
diet (TDD) initiated 21 days after EAE induction. Trp depletion 
interfered with disease recovery in control mice (Fig. 3a). Trp or 
13S administration ameliorated EAE in control but not in CX3CR1- 
AHR TDD-fed mice (Fig. 3a), suggesting that microglial AHR 
participates in EAE amelioration by Trp metabolites. In addition, 
a TDD initiated 14 days after EAE induction worsened disease in 
control mice and also in mice with AHR- deficient microglia or astro- 
cytes (Extended Data Fig. 7a—-c), suggesting that Trp metabolites 
limit CNS inflammation via both microglial and astrocytic AHR. 
Indeed, I3S administration initiated 14 days after EAE induction 
ameliorated disease via AHR in astrocytes and microglia (Extended 
Data Fig. 7b, c). Similar results were obtained when AHR was 
knocked down in astrocytes or microglia (Extended Data Fig. 7d-g). 
Collectively, these findings suggest that microglial AHR deletion 
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Fig. 4 | VEGF-B and TGFa control human astrocytes and are expressed 
by CD14* cells in MS lesions. a~c, mRNA expression determined by 
qPCR in human microglia activated in the presence of 13S or the AHR 
antagonist CH223191 24h after activation. n = 4 biological replicates. Data 
are mean + s.e.m. and representative of three independent experiments. 

P values were derived by one-way ANOVA followed by Tukey’s 

post-hoc test. d, mRNA expression determined by qPCR in primary 
human astrocytes activated in the presence of TGFa or VEGF-B. Data 

are mean + s.e.m. and representative of three independent experiments 
from four biological replicates. P values were derived by one-way ANOVA 
followed by Tukey’s post-hoc test. e, Immunofluorescence staining 

for AHR (A, red), VEGF-B (B, red), or TGFa (C, red), CD14 (green), 


renders astrocytes unresponsive to the anti-inflammatory effects of 
AHR ligands at later EAE stages. 

The transcriptional response of microglia in TDD-fed control 
mice resembled that of AHR-deficient microglia (Fig. 3b, c). Indeed, 
the microglial expression of Ahr and its target gene Cyp1b1 was 
suppressed in TDD-fed and CX3CR1-AHR mice, and could be 
restored by Trp or I3S supplementation in control but not in CX3CR1- 
AHR mice (Fig. 3d). Moreover, dietary Trp metabolites promoted 
Socs2 expression, which mediates NF-kB regulation by AHR!!! 
and suppressed the microglial expression of NF-KB dependent 
transcripts such as Tnf (also known as Tnfa) in an AHR-dependent 
manner (Fig. 3b and Extended Fig. Data 8a). In addition, dietary 
Trp metabolites also regulated microglial Tgfa and Vegfb expression 
via AHR (Fig. 3e). Accordingly, astrocytes from CX3CR1-AHR and 
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and DAPI (blue) in human brain samples corresponding to normally 
appearing white matter (NAWM), active, and chronic MS lesions. Data 
are representative of n= 12 fields from three distinct MS brains. Images 
on the right show magnified views. Insets highlight co-expression of 
AHR and CD14, VEGF-B and CD14, and TGFa and CD14. f, Myelin 
oligodendrocyte glycoprotein (green) staining in tissues from patients 
with MS. Nuclear staining was done using Hoechst (blue). Representative 
sections of NAWM, active and chronic lesions from patients with MS 
(n=3). g, Quantification of AHR, VEGF-B and TGFa expression in 
NAWM, active, or chronic lesions in MS tissue. Data are mean + s.e.m 
and representative of 25 fields from 3 distinct MS brains. P values were 
determined by one-way ANOVA followed by Tukey’s post-hoc test. 


TDD-treated control mice showed increased expression of genes 
linked to EAE pathogenesis such as Ccl2 and Nos2 (Extended Data 
Fig. 8b and Fig. 3f). Collectively, these findings suggest that dietary 
Trp metabolites such as 13S limit NF-KB driven pro-inflammatory 
programs in microglia and suppress their ability to promote pro- 
inflammatory activities in astrocytes. 

We validated our observations using human samples. AHR was acti- 
vated by I3S and inhibited by the antagonist CH223191 in primary 
human microglia, as indicated by the expression of AHR and its target 
CYP1B1 (Fig. 4a). Microglial AHR activation suppressed pro-inflam- 
matory and neurotoxic gene expression (TNF, IL6, IL12A and NOS2) 
and boosted anti-inflammatory IL10 expression; AHR activation also 
promoted TGFA and suppressed VEGFB expression in human micro- 
glia (Fig. 4b, c). More importantly, TGFa and VEGF-B suppressed and 
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boosted, respectively, pro-inflammatory gene expression in primary 
human astrocytes (Fig. 4d). 

Finally, we analysed AHR, TGFa and VEGF-B expression on brain 
samples from patients with MS. We detected AHR, TGFa and VEGF-B 
expression in CD14* cells (microglia and recruited monocytes) in the 
normally appearing white matter (NAWM), demyelinated active and 
chronic multiple sclerosis (MS) lesions; the highest AHR, VEGF-B and 
TGFa expression was detected in CD 14° cells in MS active lesions 
(Fig. 4e, fand Extended Data Fig. 8c). VEGF-B and AHR expression in 
chronic MS lesions were comparable to their levels in NAWM, whereas 
TGFa expression was decreased to levels below those detected in the 
NAW, resulting in a higher VEGF-B/TGFa ratio than in NAWM 
(Fig. 4g and Extended Data Fig. 8d). These findings suggest that 
VEGF-B and TGFa participate in the control of astrocytes by micro- 
glia in humans and contribute to MS pathogenesis. 

In summary, we found that AHR-controlled microglial VEGF-B and 
TGFa regulate astrocyte pathogenic activities during EAE. VEGF-A 
promotes CNS pathology by several mechanisms including angiogenesis 
induction”!, and microglia” and T cell”? stimulation, but less is known 
about VEGF-B, which does not promote angiogenesis in the CNS and 
shows neuroprotective effects in some models”*. Our data suggest that 
FLT-1 activation in astrocytes by VEGF-B produced by microglia and 
other sources”° promotes CNS inflammation, identifying VEGF-B- 
FLT-1 signalling inhibitors as candidate therapeutics for CNS inflam- 
mation. Conversely, TGFa induces astrogliosis and neuroprotective 
factor production, and increases neuronal survival and axonal growth 
in multiple contexts, including models of spinal cord injury”””®. Indeed, 
based on the promotion of axon regeneration by reactive astrocytes in 
spinal cord injury models”, it is tempting to speculate that microglial 
TGFa promotes these beneficial astrocyte activities. Future studies 
should address whether the control of TGFa-ErbB1 signalling via 
AHR contributes to the beneficial effects of commensal bacteria on 
spinal cord injury*”. In conclusion, our findings define a gut—brain 
axis by which metabolites of dietary Trp controlled by the commensal 
flora act directly on CNS-resident microglia and astrocytes” to limit 
inflammation and neurodegeneration via AHR. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0119-x. 
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METHODS 


Animals. C57BL/6] mice were obtained from the Jackson Laboratory and were all 
female. Cx3cr1#®!” mice® were a gift from S. Jung and were bred to Ahr mice. 
To delete microglial AHR, 4—5-week-old mice were injected subcutaneously with 
4mg tamoxifen (Sigma) in 200,11 warm corn oil at two time points 48 h apart. 
Cx3cr1 crERT2 negative Ahr! mice were used as controls. Four weeks later, EAE 
was induced. To delete AHR in all CX3CR1-expressing cells, control and CX3CRI1- 
AHR mice were gavaged weekly with tamoxifen starting from 5 weeks of age. EAE 
was induced and weekly tamoxifen gavages were continued after EAE induction. 
Bone marrow chimaera were generated as previously described to minimize 
irradiation-induced artefacts*!”. In brief, 7-week-old wild-type recipient mice were 
lethally radiated with a dose of 9.5 Gy. One day later, mice were administered 5 x 10° 
bone marrow cells isolated from donor femora and tibiae by intravenous injection. 
Donors were Cx3cr1“*-negative Ahr and Cx3cr1“*-positive Ahr mice. Bone 
marrow recipients were then rested for 3 weeks and thereafter treated with weekly 
tamoxifen gavages for another 3 weeks; after a total of 6 weeks, EAE was induced 
as described below. Tamoxifen administration was continued weekly during 
EAE. All mice were on the C57BL/6 background and were kept in a pathogen- 
free facility at the Harvard Institutes of Medicine. All experiments were carried out 
in accordance with guidelines prescribed by the Institutional Animal Care and Use 
Committee (IACUC) at Harvard Medical School. 
EAE induction and treatment. EAE was induced in 8-week-old mice by subcuta- 
neous immunization with 150 1g myelin oligodendrocyte glycoprotein (MOG35-s5) 
peptide (Genemed Synthesis Inc.) emulsified in complete Freund’s adjuvant (CFA, 
Difco Laboratories) per mouse, followed by administration of 200 ng pertussis 
toxin (PTX, List Biological Laboratories, Inc.) on days 0 and 2 as described!°, 
Clinical signs of EAE were assessed as follows: 0, no signs of disease; 1, loss of tone 
in the tail; 2, hind limb paresis; 3, hind limb paralysis; 4, tetraplegia; 5, moribund. 
All agents were purchased from Sigma-Aldrich. 
Isolation of cells from adult mouse CNS. Mononuclear cells were isolated from 
the CNS as previously described, and astrocytes, monocytes, and microglia were 
sorted as described before’™!*'”. Isolated CNS cells were stained with fluoro- 
chrome-conjugated antibodies to CD11b (M1/70, 1:100), CD45 (90, 1:100), Ly6C1 
(HK1.4, 1:100), CD105 (N418, 1:100), CD140a (APAS, 1:100), CD11c (N418, 
1:100), F4/80 (BMB8, 1:50), 04 (O04, Miltenyi Biotec, 1:10), and CD19 (eBiolD3, 
1:100). All antibodies were from eBioscience or BD Pharmingen, unless otherwise 
mentioned. Microglia were sorted as CD11b* cells with low CD45 expression 
and low Ly6C1 (CD11b*CD45"°Ly6C1"*), inflammatory monocytes were con- 
sidered as CD45"CD11b*Ly6C1™. Astrocytes were sorted as CD11b'°CD45"° 
Ly6C1'°CD105"°CD140a'°CD11b!°F4/80'°O4'°CD 19" after the exclusion of lym- 
phocytes, microglia, oligodendrocytes, and monocytes. 
Flow cytometry staining and acquisition. Mononuclear cell suspensions were 
prepared as previously described!*!®!”, Antibodies for flow cytometry were 
purchased from eBioscience or BD Pharmingen and used at a concentration of 
1:100 unless recommended otherwise by the manufacturer. Mouse AHR antibody 
(1C6697G) and mouse FLT-1 antibody (FAB4711A) were from R&D Systems, 
VEGEF-B (RM0008-6E72) and TGFa (MF9) from Novus Biologicals, EGFR (D38B1) 
and p-p65 (93H1) from Cell Signaling. Cells were then analysed on a LSRII or 
MACSQuant flow cytometer (BD Biosciences and Miltenyi Biotec, respectively). 
As outlined in the individual figures, T-helper 1 (T}1) cells were defined as 
CD3tCD4*IFNy*IL-17-IL-10° FOXP3-, Ty17 cells as CD3*CD4tIFN+_IL- 
17*IL-10°- FOXP3, T-regulatory (Treg) cells as CD3*CD4*tIFN7IL-17- FOXP3*, 
microglia as CD11b+CD45"°Ly6C", and pro-inflammatory monocytes as 
CD45"CD11b*Ly6c™, 
RNA-seq. Mice were euthanized at day 25 after disease induction and astrocytes 
isolated as described above. RNA was sequenced using the strand-specific TruSeq 
protocol. High coverage (>50 M) strand-specific paired-end 76-base-pair (bp) 
reads were aligned to the mm10/GRCm38 mouse reference genome using TopHat 
v2.0.11°°. Gene expression levels were estimated for 38922 GenCode Release M2 
(GRCm38.p2) mouse gene annotations using Cuffquant and Cuffnorm v2.2.1 
quartile normalized FPKMs*®. 
nCounter gene expression. In total, 50 ng of total RNA was hybridized with 
reporter and capture probes in custom-made astrocyte-targeted nCounter 
Gene Expression code set according to manufacturer's instructions (NanoString 
Technologies). Data were analysed using nSolver Analysis software. 
qPCR. RNA was extracted using the RNAeasy kit (Qiagen), cDNA was pre- 
pared and used for qPCR with the results normalized to Gapdh. All primers 
and probes were from Applied Biosystems. Mouse: Ahr Mm00478932_m1, 
Aldh1al Mm00657317_m1, Aqp4 Mm00802131_m1, Ccl20 Mm01268754_m1, 
Ccl2 Mm00441242_m1, Ccl8 Mm01297183_m1, Cxcl3 Mm01701838_m1l, 
Cyp1b1 Mm00487229_m1, Erbb1 Mm01187858_m1, Fit! Mm00438980_m1, 
Gapdh Mm99999915_g1, Gfap Mm01253033_m1, 1110 Mm00439614_m1, 
1112a Mm00434165_m1, [123a Mm01160011_g1, Il6 Mm00446190_m1, Itgam 
Mm00434455_m1, Nos2 Mm00440502_m1, Tgfa Mm00446232_m1, Tnf 


Mm00443258_m1 and Vegfb Mm00442102_m1. Human: AHR Hs00169233_m1, 
CCL2 Hs00234140_m1, CYP1B1 Hs02382916_s1, ERBB1 (also known as EGFR), 
FLT1, IL6 Hs00985639_m1, NOS2 Hs01075529_m1, TGFA Hs00608187_m1 and 
TNF Hs01113624_g1, VEGFB Hs00173634_m1. 

T cell proliferation. Splenocytes were cultured in X-VIVO 15 medium (Lonza) 
and were plated for 72h at a density of 5 x 10° cells per well with increasing con- 
centrations of MOG35-_5s peptide. During the final 16h, cells were pulsed with 
1 Ci PH]thymidine (PerkinElmer), followed by collection on glass fibre filters and 
analysis of incorporated [7H]thymidine in a beta-counter (1450 MicroBeta TriLux; 
PerkinElmer). For intracellular cytokine staining, cells were stimulated for 4h with 
PMA (phorbol 12-myristate 13-acetate; 50 ng ml}; Sigma), ionomycin (1 pg ml}; 
Sigma) and monensin (GolgiStop; 21M BD Biosciences). After staining of surface 
markers, cells were fixed and made permeable according to the manufacturer's 
instructions (BD Cytofix/Cytoperm Kit (BD Biosciences) or FOXP3 Fixation/ 
Permeabilization (eBioscience). 

Primary astrocyte and microglia cultures. Cerebral cortices from neonatal 
mice (1-3 days) were dissected, carefully stripped of their meninges, digested 
with 0.25% trypsin-EDTA and DNase I (1 mgm!) for 15 min, and dispersed to 
single-cell level by passing through a cell strainer (701m). The cell suspension was 
then cultured at 37°C in humidified 5% CO», 95% air on poly-L-lysine (Sigma) 
precoated 75 cm? cell culture flasks. Medium was replaced every 4—5 days. After 
7-10 days, cells reached confluence and astrocytes and microglia were isolated by 
mild trypsinization with Trypsin-EDTA (0.06%) as previously described!™!®!7, 
Cells were >95% astrocytes as determined by staining with GFAP or GLAST, with 
less than 5% contamination of CD11b* microglia cells (not shown). Conversely, 
microglia cultures stained CD11b+CD45"°Ly6C1'° >95%. After the isolation 
procedure, cells were further plated as required for the specific experiments. 
Concentrations of agents were 100ng ml! for LPS (Sigma), 501g ml“! for 
poly(I:C), 100ngml“! for IL-18, 50ngml“! for TNE, 0.1ngml-! TGFa, lOngml! 
VEGEF-B (all R&D Systems), 50 .g ml! 3-indoxyl-sulfate (Sigma), 100nM NF-KB 
Blocker Bengamide B (Tocris). Unless otherwise indicated, RNA was isolated 24h 
after start of treatment. For western blot analysis, cells were pretreated with 13S or 
vehicle for 24h, thereafter LPS was added and protein prepared after 2h. 
Plasmids. Constructs encoding p65, AHR, pTgfa-Luc, pVegfb-Luc, as well as 
pTK-Renilla were obtained from Addgene. The pLenti-GFAP-EGFP-mir30-shAct1 
vector was a gift from G.-X. Zhang™. The pLenti-CD11b-EGFP-mir30-shRNA was 
also provided by G.-X. Zhang, who generated it by exchanging the Gfap promoter 
with the Cd11b (also known as Itgam) promoter. 

In vivo knockdown with shRNA lentivirus. shRNA sequences against Ahr, Tgfa, 
Vegfb, Erbb1 or Fit1 and a non-targeting control shRNA were cloned into pLenti- 
GFAP-EGFP-mir30-shRNA or pLenti-CD11b-EGFP-mir30-shRNA using the 
following validated shRNA sequence against Ahr (5'/-CCGGCATCGACATAAC 
GGACGAAATCTCGAGATTTCGTCCGTTATGTCGATGTTT TTG-3’), Tgfa 
(5'-CCGGTCGTCAGGATGCGTGTCTTATCTCGAGATAAGACACGCATCC 
TGACGATTTTTG-3’), Vegfb (5'-CCGGGCCAATGTGAATGCAGACCAA 
CTCGAGTTG GTCTGCATTCACATTGGCTTTTTG-3’), Erbb1 (5'- 
CCGGGCTGGATGATAGATGCTGATACTCGAGTATCAGCATCTATCATCC 
AGCTTTTTIG-3’) or Fit1 (5'-CCGGCGTGACCTTTAATCGTGCTTTCTCG 
AGAAAGCACGATTAAAGGTCACGTTTTT-3’) as described!®. Lentivirus 
particles were generated by transfecting HEK-293FT cells (Invitrogen) with the 
pLenti-GFAP-EGFP-mir30-shRNA or pLenti-CD11b-EGFP-mir30-shRNA vector 
and the ViraPower Packaging mix (helper plasmids pLP1, pLP2, pLP/VSV-G, 
Invitrogen). Supernatants were collected, filtered through a 0.45-j1m PVDF filter, 
and concentrated overnight with the Lenti-X concentrator kit (Clontech). The 
viral titre was determined using the Lenti-X qRT-PCR titration kit (Clontech). For 
in vivo knockdown, immunized mice were anaesthetized at indicated time points, 
positioned in a Kopf Stereotaxic Alignment System and injected with 10’ IU 
of respective virus using a Hamilton syringe 0.44 mm posterior to the bregma, 
1.0 mm lateral to it and 2.3 mm below the skull surface. The injection system was 
retracted slowly, skin incisions closed carefully by surgical sutures, mice allowed 
to wake up in a cage pre-warmed with a red light and mice checked twice daily 
thereafter. 

Assessment of toxicity towards neurons and oligodendrocytes. N2A neuronal 
cells (ATCC CCL-131, ATCC) or mouse oligodendrocytes (Celprogen, 11004-02) 
were grown in 96-well plates and pre-activated with mouse IFN (100ngml"!, 
R&D Systems) for 24h. Thereafter, medium was supplemented after extensive 
PBS washes with ACM. Cytotoxicity was determined by quantifying LDH release 
(CytoTox 96 Non-Radioactive Cytotoxicity Assay, Promega) after 24h as suggested 
by manufacturer’s protocol. 

Monocyte migration assay. Splenic monocytes from wild-type mice were pre- 
enriched by CD11b beads (Miltenyi) and sorted for F4/80*SSC'°Ly6C". These 
monocytes were seeded in the upper chamber of a 24-well cell culture insert with 
5\um pore-size (Corning) containing ACM. Migrating monocytes were quantified 
in the lower chamber after 3 h. 
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Microglia polarization assays. Wild-type microglia were co-cultured with astro- 
cytes pre-treated with TGFa or VEGF-B and extensively washed. After 24h, 
microglia were re-isolated, RNA was isolated, transcribed and subjected to qPCR 
analysis. 

Subcellular fractionation and immunoblot analysis. In vitro microglia cultures 
were treated as indicated in specific experiments, subcellular fractions generated 
using Cell Fractionation kit (Cell Signaling) and 101g of nuclear and cytoplas- 
mic fractions were separated by 4-12% Bis-Tris Nupage gels (Invitrogen) and 
transferred onto PVDF membranes (Millipore). As primary antibodies rabbit 
anti-GAPDH monoclonal antibody (14C10, Cell Signaling), anti-histone H3 rabbit 
polyclonal antibody (EMD Millipore), anti-NF-«B p65 rabbit monoclonal antibody 
(D14E12, Cell Signaling) were used, followed by goat anti-rabbit IgG horseradish 
peroxidase (HRP)-linked antibody (7074S, Cell Signaling). All antibodies were 
used at a dilution of 1:1,000. Blots were developed using the SuperSignal West 
Femto Maximum sensitivity kit (Thermo Scientific/Life Technologies). Data 
quantification was done using Image J software 1.48v (NIH) and specific signals 
normalized to GAPDH (cytoplasm) or histone 3 (nucleus). 

Chromatin immunoprecipitation. Cells were cross-linked with 1% paraformal- 
dehyde and lysed with 350 1l lysis buffer (1% SDS, 10mM EDTA, 50mM Tris-HCl, 
pH 8.1) containing 1x protease inhibitor cocktail (Roche Molecular Biochemicals). 
Chromatin was sheared by sonication and supernatants were collected after cen- 
trifugation and diluted in chromatin immunoprecipitation incubation buffer (1% 
Triton X-100, 2mM EDTA, 150mM NaCl, 20 mM Tris-HCl, pH 8.0). Ten micro- 
grams of antibody was prebound for 6h to protein A- and protein G-Dynal magnetic 
beads (Invitrogen) and washed three times with PBS plus 1% bovine serum albumin 
(BSA), and then added to the diluted chromatin and immunoprecipitated rotating 
overnight. The magnetic bead-chromatin complexes were then washed three 
times in RIPA buffer (50mM HEPES (pH 7.6), 1mM EDTA, 0.7% sodium deox- 
ycholate, 1% NP-40, 0.5 M LiCl) and then twice with Tris-EDTA (TE) buffer. 
Immunoprecipitated chromatin was then extracted with 1% SDS, 0.1M NaHCO; 
and heated at 65°C for 8h to reverse the paraformaldehyde cross-linking. DNA 
fragments were purified with a QIAquick DNA purification kit (Qiagen) and 
analysed using the SYBR Green real-time PCR kit (Takara Bio Inc.). Anti-AHR 
(BML-SA210, Enzo Life Sciences), anti- NF-kB p65 (D14E12) XP rabbit mono- 
clonal antibody (8242, Cell Signaling Technology), and recombinant IgG isotype 
control were used as indicated in specific experiments. The following primer pairs 
were used: VEGFB_NFKbI1: forward 5’-TCTGTGGCATAGAAACCCAAAG-3’, 
reverse 5’‘-ACCCTAAGTCACTGGCTGTC-3’, VEGFB_AHRI: forward 5/- 
ACCTTCTTCACAGGACAGCC-3’ and reverse 5‘-AGTCTCCGAACTCTG 
GTGTC-3’, VEGFB_AHR2: 5’‘-GAGTTAACTGCAATTCCTTCACA-3’ and 
reverse 5/-CTGGAGGGTGGTGCTGAAG-3’, VEGFB_ NFK2/AHR3: forward 5’- 
TTCATTGGTCCTCTCCCTGC-3’ and reverse 5’-CAGGGGAAA 
GGGGACACAC-3’, VEGFB_AHR3 + 4: forward 5’-GTCCCCTTTCCC 
CTGCAG-3’ and reverse 5'‘-AGAGGCTCATGTGACCTAAACA-3/, TGFA_AHRI: 
forward 5’-GCCAAGGGAGCATGAACTAG-3’ and reverse 5’/-GATG 
CTCAAAGTTTCAGAGTTGA-3’, TGFA_AHR2: forward 5'’-AGGAGAGGG 
GTCAGTCTGAT-3’ and reverse 5’-AGAGGGAAACACAAG AAGGGA-3’, 
TGFA_AHR3: forward 5‘-GACTCAGAGTGGGGCCAG-3’ and reverse 
5'-GAGTCGCTCAGGATCCAGTC-3’. 

Human primary astrocytes and microglia. Human fetal astrocytes and microglia 
were isolated as previously described'**> from human CNS tissue from fetuses 
at 17-23 weeks of gestation obtained from the Human Fetal Tissue Repository 
(Albert Einstein College of Medicine) and from the University of Washington 
Birth Defects Research Laboratory (BDRL, Project Number 5R24HD000836-51) 
following Canadian Institutes of Health and NIH Research-approved guidelines. 
Primary human astrocytes and microglia were treated with human VEGF-B 
(0.1ngml~!, R&D Systems), human TGFa (0.1 ngml~!, R&D systems), or vehi- 
cle, poly(I:C) (10mg ml!) with or without 3-indoxyl-sulfate (50\1M, Sigma) or 
untreated (control) as indicated in the respective figures. After 24h, total RNA was 
isolated, transcribed and subjected to qPCR. 

Immunohistochemistry and astrocyte morphometry. Mice were anaes- 
thetized (intraperitoneal injection of 100 mg ketamine and 5 mg xylazine per 
kg body weight) and transcardially perfused with PBS followed by 4% para- 
formaldehyde in PBS. CNS and other organs were post-fixed for 4-6h at 
4°C, washed in PBS and incubated in 30% sucrose in PBS at 4°C until fully 
enriched. Samples were embedded in OCT (Tissue-Tek) for frozen section- 
ing on a cryostat (Leica). All the stainings were performed on 301m thick 
transversal spinal cord sections. The sections were permeabilized in block- 
ing solution (0.5% Triton-X 100, 5% BSA, 5% normal donkey serum and 
0.1% NaN3 in PBS) for 1h at room temperature. Primary antibodies were 
dissolved in blocking solution and incubated overnight at 4°C with the fol- 
lowing primary antibodies: Lys310-acetyl p65 (ab52175, Abcam), SOX9 
(AF3075, R&D, 1:200), GEAP (RBK037, Zytomed, 1:5,000), IBA-1 (ab178846, 
Abcam, 1:500), DAPI (1:5,000). Conjugated secondary antibodies used were 


LETTER 


donkey anti-rabbit Alexa Fluor 647 (Invitrogen A-31573), donkey anti-rabbit 
Alexa Fluor 568 (Invitrogen A-10042) and donkey anti-goat Alexa Fluor 488 
(Life Technologies A11055) for 2h at room temperature. TUNEL staining 
was performed using the In situ Cell Death Detection Kit, TMR red (Roche, 
12156792910). Slices were mounted with ProLong Diamond Antifade Mountant 
(Life Technologies, P36961). 

For immunohistochemistry with DAB, after harvesting as described above, spinal 
cords were cut transversally, embedded in paraffin blocks, cut in 4j1m thick 
slices in a microtome, cooked 40 min in citrate buffer at 98 °C using a vapour 
cooker for antigen retrieval, incubated one hour using 10% normal goat serum 
(SouthernBiotech, 0060-01) in PBS as blocking buffer, incubated overnight at 4°C 
using primary rat anti-Mac3 antibody (BD-Pharmingen Biosciences, 553322) 
diluted 1:200 in blocking buffer, incubated biotin-conjugated goat anti-rat anti- 
body (SouthernBiotech, 3050-08) 45 min at room temperature diluted 1:1,000 in 
blocking buffer as secondary antibody and finally incubated for 45 min at room 
temperature with streptavidin- HRP (SouthernBiotech, 7100-05) diluted 1:1,000 in 
PBS. The samples were washed four times in PBS plus 0.1% Triton X-100 between 
each step. The DAB reaction was run for 5 min using the En Vision Flex kit (Dako). 
Slides were counterstained with Gill’s Hematoxilin, dehydrated trough an ethanol 
and xylol gradient and mounted with in vitro-Clud (R. Langenbrinck, 04-0001) 
using a Leica CV5030 system. All images were taken with a Keyence BZ-9000 
microscope. For astrocyte reconstruction a Leica Confocal SP8 microscope 
was used. The Imaris 9.0.2 Software was used to reconstruct and get the morpho- 
metry data. 

In vivo demyelination assay. Naive mice were anaesthesized and injected 
stereotaxically into the corpus callosum (1.4 anterior, 1.0 lateral, 2.1mm deep) 
with 2 tl of Lysolecithin (1% (w/v), Sigma), VEGF-B (500 ng, R&D Systems), 
or PBS. Mice were euthanized 6 days later and subjected to myelin staining as 
described below. 

MS tissue and immunofluorescence. Brain tissue was obtained from untreated 
individuals with clinically diagnosed and neuropathologically confirmed MS, and 
healthy controls as previously described'"®. All MS individuals and controls, or 
their next of kin, had given informed consent for autopsy and use of their brain 
tissue for research purposes. All the procedures were performed in accordance 
with local Institutional Review Board guidelines. MS samples were processed and 
immunostained as previously described’. In brief, sections were thawed, fixed, 
washed and blocked with donkey serum 10%. Sections were then incubated over- 
night at 4°C with antibodies against AHR (rabbit anti- AHR, Enzo Life Sciences), 
CD14 (mouse anti-human CD14, 325602, Biolegend), TGFa (ab112030, Abcam) 
and VEGF-B (ab185696, Abcam). After washes the samples were incubated at 
room temperature for 40 min with the secondary antibodies (donkey anti rabbit 
RRX and donkey anti mouse Alexa 488, Jackson ImmunoResearch). Imaging was 
performed using a Leica SP5 confocal microscope and the Leica LAS AF software. 
Images were processed using Adobe Photoshop CS2. For imaging analysis, all the 
data were acquired using the same settings, which were originally standardized 
on NAWM sections. The degree of co-localization of CD14 with AHR, TGFa, 
and VEGE-B was determined using the Volocity software from Perkin Elmer. The 
overlap coefficient is expressed in percentage where 100% represents the maximum 
degree of co-localization and 0% denotes no co-localization. 

RNA-seq data processing. RNA-Seq data was analysed using DESeq2. Gene 
expression with 0 counts and low expression were removed before differential anal- 
ysis. Low expressed were filtered out by the DESeq independent filtering, which 
removes genes in the lowest 40% quantile of mean normalized counts. Differential 
genes were selected with FDR < 0.1. 

Heat maps. Heat maps were generated with the Gene-E program, and the z-scores 
were calculated for each gene row using the mean expression of biological 
replicates. 

tSNE plots. SNE plot was created using R and the Rtsne package, with parameters 
perplexity = 1, max iterations = 3,000. The mean average of replicates and the top 
500 ranked genes for TDD + Trp and TDD in control mice (n= 3) was taken and 
the final plot was generated using ggplot. 

Ingenuity pathway analysis. To determine significant pathways, differentially 
expressed genes that passed FDR < 0.1 were uploaded and analysed using 
Ingenuity Pathway Analysis (IPA) tool. P values of canonical signalling path- 
ways were calculated using Fischer's exact test. The NF-kB network diagram was 
generated using IPA. 

Network diagram. Network diagram for protein-protein interactions was visu- 
alized with NetworkAnalyst, (http://www.networkanalyst.ca), using the STRING 
Interactome database (confidence score cutoff= 900). Minimum network displaying 
interacting mediators and molecules were colour-coded based on associated path- 
ways for VEGF-B and TGFa. 

Statistical analysis. Statistical analyses were performed with Prism software 
(GraphPad), using the statistical tests indicated in the individual figure legends. 
No samples were excluded. The investigators were blinded as to the treatment 
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of mice in individual experiments. P < 0.05 was considered significant. All error 
bars represent s.e.m. or s.d. as noted in the individual figure legends. Unless other- 
wise stated, three independent experiments were used for all assays, and displayed 
figures are representative. The experiments were not randomized, and no statistical 
methods were used to predetermine sample size. 


Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. RNA-seq data have been uploaded and are accessible under the 
access code https://figshare.com/s/c109f251b149b7a843b3. 
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Extended Data Fig. 1 | See next page for caption. 
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Extended Data Fig. 1 | Contribution of AHR in CNS resident and 
infiltrating immune cells during EAE. a, qPCR of indicated genes 

from microglia, splenic macrophages, and astrocytes from control and 
CX3CR1-AHR mice on day 28 after EAE induction. n= 8 independent 
samples per group. b, Flow cytometry analysis of AHR expression in 
microglia, monocytes and astrocytes from Control and CX3CR1-AHR 
mice 21 days after EAE induction. Thin line depicts isotype control, thick 
line AHR staining, and numbers indicate percentage of AHR positive 
cells. Representative of stainings of n =3 mice per group. ¢, Spinal cord 
samples from naive Control and CX3CR1-AHR mice were stained for 
Iba-1 and DAPI and Iba-1+ microglia/mm* were determined. n =5 mice 
per group. n.s., not significant. d, TUNEL staining in Iba-1* microglia in 
spinal cord sections of control and CX3CR1-AHR mice as in c. For the 
positive control, slides were cooked at 98 °C in citrate buffer during 60 min 
using a vapour cooker. Solid arrows show TUNEL positive microglia. 
Representative of n =5 independent experiments. e, Number of 
CNS-infiltrating (top) and splenic T cells (bottom), and splenic 
pro-inflammatory monocytes (bottom) as determined by flow cytometry. 
n=5 samples per group for CNS, n=4 samples per group for spleen. 

f, Proliferation assay from splenocytes isolated on day 28 of the 
experiment. n= 4 biologically independent samples per group, 


representative of two independent experiments. g, Bone marrow chimaera 
were generated using wild-type mice irradiated as recipients, reconstituted 
with control or CX3CR1-AHR bone marrow. Recipients of bone marrow 
were then rested for 3 weeks and thereafter treated with weekly tamoxifen 
gavages (4 mg) for another 3 weeks; after a total of 6 weeks, EAE was 
induced and tamoxifen administration continued weekly during EAE. 
Left, flow cytometry analysis of AHR expression in microglia and 
monocytes 21 days after EAE induction. Thin line depicts isotype control, 
thick line denotes AHR staining, and numbers indicate the percentage of 
AHR-positive cells. Representative of stainings of n =3 independent mice 
per group. Right, EAE clinical course in bone marrow chimaera mice. 

n= 4 mice per group. h, Control and CX3CR1-AHR mice were treated 
with oral tamoxifen weekly starting from 5 weeks of age. EAE was induced 
at 8 weeks under continuation of weekly tamoxifen administration. Left, 
intracellular FACS staining for AHR in microglia and monocytes from at 
day 21 of EAE. Representative of stainings of n = 3 independent mice per 
group. Right, clinical course of control and CX3CR1-AHR bone marrow 
chimaera mice. Data in a, c, e-h are mean +s.e.m. of n=4 mice per group. 
P values were determined by two-sided Student's t-test (a, c, e) or two-way 
ANOVA (g, h). 
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Extended Data Fig. 2 | Topical and molecular regulation of TGFa 
and VEGF-B. a, Ingenuity pathway analysis of differentially regulated 
pathways in astrocytes from m= 3 control versus CKX3CR1-AHR mice 
per group during EAE. b, Tgfa and Vegfb expression determined by 
qPCR in microglia from brain, cerebellum and spinal cord 21 days after 
EAE induction (left). n =8 mice per group. c, Predicted NF-KB and 
AHR responsive sites (NREs and XREs, respectively) in Vegfb and Tgfa 
promoters. d, Microglia were isolated by FACS sorting from control and 
CX3CR1-AHR mice during EAE. Ex vivo ChIP assay of NF-KB p65 or 
AHR binding to predicted binding sites in the Vegfb promoter. n = 3 mice 
per group. Data are representative of two independent experiments. 

e, Reporter assay using a construct in which the Vegfb promoter controls 
luciferase expression (pVegfb-Luc). Luciferase activity was measured in 


HEK293 cells 24h after transfection with pVegfb-Luc, pTK-Renilla, and 
plasmids expressing AHR or NF-«B p65. Data are representative of two 
independent experiments with four biological replicates. f, Ex vivo ChIP 
assay as in d for AHR binding to the Tgfa promoter. n = 3 mice per group. 
Representative of two independent experiments. g, Reporter assays using 
a construct in which the Tgfa promoter controls luciferase expression 
(pT gfa-Luc). Luciferase activity was measured in HEK293 cells 24h after 
transfection with pTgfa-Luc, pTK-Renilla, and plasmids expressing AHR 
or control. Data are representative of two independent experiments with 
three biological replicates. Data in a, d~g are mean + s.e.m. P values were 
determined by one-way ANOVA followed by Tukey’s post-hoc test 

(b, d, e, f) or two-sided Student's t-test (g). 
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Extended Data Fig. 3 | See next page for caption. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


Extended Data Fig. 3 | TGFa and VEGF-B are regulated by AHR in 
highly purified astrocytes and microglia. a, b, Mouse microglia were 
activated with lipopolysaccharide (LPS) in the presence or absence of the 
AHR inhibitor CH223191. After 24h, activation medium was removed 
and substituted with fresh medium after extensive washes. Then 48h 

later, microglia conditioned medium (MCM) was collected and applied to 
cultures of primary astrocytes. a, Gene expression in microglia 24h after 
activation in the presence or absence of CH223191. b, Gene expression 

in astrocytes after 24h exposure to MCM. Data are representative of 

two independent experiments with three biological replicates. 

c, Representative FACS stainings for CD11b and CD45 in primary 
astrocyte and microglia cultures. Numbers indicate percentages in 
respective gate. Data are representative of three independent experiments. 
d, Representative FACS stainings for GFAP and GLAST in astrocyte 
cultures as in b. Data are representative of three independent experiments. 
e, f, qPCR analysis of mRNA expression in astrocyte and microglia 
cultures. n =4 independent cultures. Data are representative of two 
independent experiments with four biological replicates. g, Effect of TGFa 
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and VEGE-B on gene expression in primary astrocytes activated with 
TNE and IL-1, determined by pPCR after 24h. Data are representative of 
three independent experiments with three biological replicates. 

h, i, Primary mouse astrocytes were activated with TNF and IL-18 and 
treated with TGFa or VEGF-B. After 24h later, culture medium was 
substituted by fresh medium after extensive washes. Then 48h later, ACM 
was added to mouse neurons (h) and oligodendrocytes (i) in culture, and 
cytotoxicity was determined by quantifying lactate dehydrogenase (LDH) 
release after 24h. n =3 biological replicates. Data are representative of 
two independent experiments. j, CD11b*+Ly6C™ monocyte migration 
assay performed using ACM from astrocytes activated in the presence 

of TGFa or VEGF-B. n=4 biological replicates. Data are representative 
of two independent experiments. k, qPCR analysis of Nos2 expression in 
microglia co-cultured with astrocytes activated in the presence of TGFa 
or VEGF-B. n=3 biological replicates. Data are representative of two 
independent experiments. Data in b, e-k are mean +s.e.m. P values were 
determined by two-sided Student’s t-test (b, e, f) or one-way ANOVA 
followed by Tukey’s post-hoc test (g-k). 
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Extended Data Fig. 4 | See next page for caption. 
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Extended Data Fig. 4 | Phenotypical and functional effects of 
knockdown of microglial TGFa and VEGF-B. a, Quantification of 
astrocyte numbers in spinal cord sections of knockdown mice. SOX9- 
positive astrocytes per mm” were quantified in spinal cord sections of four 
mice per group. b, IMARIS reconstruction of GFAP* astrocytes in spinal 
cord sections as in a, and quantification of dendrite length, branches, 
volume, terminal points, and segments of n= 4 mice per group. 

c, d, qPCR analysis of Tgfa and Vegfb expression in sorted CNS-infiltrating 
inflammatory monocytes (c) and microglia (d) from mice injected with 
pCD11b-shControl, pCD11b-shTgfa, and pCD11b-shVegfb 7 days after 
EAE induction. Representative of two independent experiments with three 
biological replicates. e, qPCR analysis of Erbb1 and Fit1 expression in mice 
injected with pGFAP-shControl, pGFAP-shErbb1, and pCD11b-shFit1 7 
days after EAE induction. Representative of two independent experiments 
with three biological replicates. f, Left, flow cytometry analysis of VEGF-B 
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and TGFa expression in microglia from mice injected with 
pCD11b-shControl, pCD11b-shTgfa, and pCD11b-shVegfb 7 days after 
EAE induction. Right, quantification of VEGF-B- and TGFa-positive 
microglia in n =5 mice per group. Representative of two independent 
experiments with five biological replicates. g, Left, flow cytometry analysis 
of FLT-1 and ErbB1 expression in astrocytes from mice injected with 
pGFAP-shControl, pGFAP-shErbb1, and pCD11b-shFlt1 7 days after EAE 
induction. Right, quantification of FLT-1 and ErbB1-positive microglia in 
n=5 mice per group. Representative of two independent experiments with 
five biological replicates. h, Naive mice were injected with lysolecithin, 
VEGF-B, or PBS into the corpus callosum by stereotaxic injection and 

6 days later, brains were analysed by myelin staining. Representative of 
two independent experiments with five biological replicates. Data are 
mean + s.e.m. P values were determined one-way ANOVA followed by 
Tukey’s post-hoc test (a-h). 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


a 


b 


= = 
wi 45 Wi 4, 
yy -o- pGFAP-shControl a : -o- pCD11b-shContro/ 
= 31 -s- pGFAP-shVegfb pas -+ pCD11b-shFit? 
8 -- pGFAP-shTgfa 8 ~*- pCD11b-shErbb1 
= 2) = 
Q g 
[e) fe) 
314 : 
8 8 
© _OSo0000000 cS , 
oO 0 10 20 30 rs) 30 
Time after immunization (days) Time after immunization (days) 
C pGFAP-shControl pGFAP-shControl pGFAP-shVegfb pGFAP-sh Tgfa 
— wn 
1053 ® 
- g § 
S Eg 10°4 8 
fan) ; a] 
x iH 10°] o 
0) 10? | 0) 
2 S| oj 0 8 
0 102 103 104 108 0-10? 10° 10° 108 0102 40° 10° 108 oe TGF-o 
a % 0.00221 
1054 = 1054 g 200.2860 0.0003 
4] O 44 y 
So 10 E 10 3 15 =z = 
E | 103 L 103 | = io] " 
= | 402 ig 10? & # 
(0) 1? uL | 6 5 7 
D 04 : Oo 03 fo) 
= Seo ee ee k- wie? : : mein : ; ae : : : a 
0 102 10° 104 10° 0 102 10° 104 105 0 102 103 10* 10° 0 102 10% 10* 108 xe 
> 
CD11b (PE-Cy7) Sy 
ee 
8 
9 FLT-1 
pCD11b-shControl pCD11b-shContro!l pCD11b-shErbb1 pCD11b-shFit? e = oo age 
J 10° 10°] 10 2 Fal 
—~ | 10° oO | 1044 104 S&S 4 
a 103 o 10°] 109 fe 1 . 
x = fe) = 
a 102 = 107} 104 = 
> 0 = 0; d : nS 
= 0 102 10 104 108 0 102 105 10° 10° "0-10? 10° 10" 105 
4 4 x 
108 _ | 1084 108 > 
_~ 104 2 | 1041 104 S 
oO i £ 
F 108 = | 10° 10° i 
Ww - fe) 
oO 10? mo | 107] 102 a 
i) 0 £ | oO 04 & 
0 102 10° 10* 10° 0 102 10% 10! 108 0 102 10° 10 10° 0 10? 10° 10° 10° 
CD11b (PE-Cy7) WS 
PF 


Extended Data Fig. 5 | Directionality of TGFa and VEGF-B signalling and P values were determined by one way ANOVA followed by Tukey’s 


during EAE. a, b, EAE development in wild-type mice injected with post-hoc test. Representative of two independent experiments with four 
pGFAP-shControl, pGFAP-shErbb1 and pCD11b-shFit1 (a), or biological replicates. d, Left, flow cytometry analysis of FLT-1 and ErbB1 
pCD11b-shControl, pCD11b-shTgfa and pCD11b-shVegfb (b). Clinical expression in microglia as in b. Right, quantification of cell-surface 
course. n =5 mice per group. Representative of two independent receptor expression of microglia. Data are mean + s.e.m. and P values 
experiments with n =5 mice per group. c, Left, flow cytometry were determined by one way ANOVA followed by Tukey’s post-hoc test. 
analysis of TGFa and VEGF-B expression in astrocytes as in a. Right, Representative of two independent experiments with four biological 
quantification of cytokine-positive astrocytes. Data are mean + s.e.m. replicates. 
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Extended Data Fig. 6 | See next page for caption. 
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Extended Data Fig. 6 | Regulation and transcriptional effects of TGFa 
and VEGEF-B during EAE. a, b, NanoString analysis of mRNA expression 
in astrocytes from EAE mice injected with pCD11b-shVegfb or 
pCD11b-shTgfa (a) and pGFAP-shFlt1 or pGFAP-shErbb!1 (b; see also 

Fig. 2k, 1). Fold change in relative expression relative to control as 
determined by log2(shKD/shControl). shKD, shRNA knockdown. 
Representative of two independent experiments with pooled RNA 
isolated from n =3 mice per group. c, Principal component analysis of 
gene expression in astrocytes isolated as in a and b. Representative of 

two independent experiments with pooled RNA isolated from n =3 

mice per group. d, Ingenuity pathway analysis of significantly regulated 
pathways from astrocytes as in a and b. Representative of two independent 
experiments with pooled RNA isolated from n =3 mice per group. e, Left, 
representative flow cytometry plots depicting NF-KB p65 phosphorylation 
in wild-type astrocytes stimulated for 15 min with vehicle (top) or TNF or 
IL-1 (bottom) in the presence of TGFa, VEGF-B, or their combination. 


Numbers indicate percentage of FITC* cells. Bar graphs depict 
quantification of FITC? cells. Data are mean + s.e.m. and 

P values were determined by one way ANOVA followed by Tukey’s 
post-hoc test. Representative of two independent experiments with four 
biological replicates. f, Primary mouse astrocytes were exposed to VEGF-B 
or vehicle and pharmacological blocker of NF-kB activation. RNA was 
obtained after 18 h and subjected to qPCR analyses for the indicated genes. 
Data are mean + s.e.m. and P values were determined by one way ANOVA 
followed by Tukey’s post-hoc test. Representative of two independent 
experiments with three biological replicates. g, Primary mouse astrocytes 
were activated with TNF or IL-1 in the presence of VEGF-B or vehicle, 
and a pharmacological blocker of NF-kB activation. RNA was obtained 
after 18h and subjected to qPCR analyses for the indicated genes. Data are 
mean + s.e.m. and P values were determined by one way ANOVA followed 
by Tukey’s post-hoc test. Representative of two independent experiments 
with n = 3 biological replicates. 
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Extended Data Fig. 7 | See next page for caption. 
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Extended Data Fig. 7 | Role of AHR in astrocytes and microglia during 
EAE. a-c, EAE was induced in control (WT), Gfap’ Ahr" (GFAP-AHR), 
or CX3CR1-AHR EAE mice. Starting from day 7, mice were injected daily 
intraperitoneally with indoxyl-3-sulfate (13S), given a tryptophan-depleted 
diet (TDD), or kept on a control diet. Clinical course of EAE mice under 
treatment conditions as indicated. Representative of two independent 
experiments with n = 4 mice per group. d-f, EAE was induced in 
wild-type mice, which were treated with lentiviruses to knockdown 

AHR in astrocytes (pGFAP-shAhr) or microglia (pCD11b-shAhr). A 
noncoding RNA was used as a control. Flow cytometry quantification of 
AHR expression in astrocytes and microglia by FACS. d, Representative 
histograms of n= 4 mice per group. Numbers indicate percentage of 
AHR-positive cells; thin lines denote isotype control, thick lines denote 


AHR staining. e, Quantification of AHR-positive astrocytes and microglia 
as in d. Representative of two independent experiments with four 
biological replicates. f, EAE mice with knock down of AHR in astrocytes, 
microglia, or both as in d were subjected to daily I3S injections, TDD, or 
control diet conditions starting on day 14 after disease induction. Clinical 
course of n= 4 mice per group. Representative of two independent 
experiments with n = 4 mice per group. g, Quantification of CNS- 
infiltrating pro-inflammatory monocytes as determined by FACS at day 
28 of EAE. Representative of two independent experiments with three 
biological replicates. Data are mean + s.e.m. P values were determined by 
two-way ANOVA (a, f), or one way ANOVA followed by Tukey’s post-hoc 
test (e, g). 
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Extended Data Fig. 8 | Dietary factors influence mouse and human and CD14, TGFa and CD14 in immunofluorescence stainings of human 
TGFo and VEGE-B expression. a, Ingenuity pathway analysis of NF-KB white matter brain tissue of NAWM, active, or chronic MS lesions for AHR 
signalling comparing a TDD toa TDD plus Trp diet in control animals. (left), VEGF-B (middle), or TGFa (right), CD14 (green), and DAPI (blue). 
Colours code for up- and downregulation of individual members in red Data are representative of n = 12 fields from three distinct MS brains. 
(up) and blue (down). Normalized reads of two independent samples per d, Ratio of VEGF-B to TGFa intensities. Data are the ratio of mean values 
group. b, mRNA expression determined by qPCR in from EAE mice as in from Fig. 4e + s.e.m of n =25 fields. Data in b and d are mean +s.e.m. 


Fig. 3a. Data are representative of two independent experiments with three _P values derived by one-way ANOVA followed by Tukey’s post-hoc 
replicates. c, Quantification of co-expression of AHR and CD14, VEGF-B test (b, d). 
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Mechanism of phosphoribosyl-ubiquitination 
mediated by a single Legionella effector 


Anil Akturk!*, David J. Wasilko'*, Xiaochun Wul, Yao Liu’, Yong Zhang”, Jiazhang Qiu, Zhao- Qing Luo’, Katherine H. Reiter’, 


Peter S. Brzovic’, Rachel E. Klevit? & Yuxin Mao! 


Ubiquitination is a post-translational modification that regulates 
many cellular processes in eukaryotes'~*. The conventional 
ubiquitination cascade culminates in a covalent linkage between 
the C terminus of ubiquitin (Ub) and a target protein, usually on 
a lysine side chain». Recent studies of the Legionella pneumophila 
SidE family of effector proteins revealed a ubiquitination method in 
which a phosphoribosyl ubiquitin (PR-Ub) is conjugated to a serine 
residue on substrates via a phosphodiester bond~®. Here we present 
the crystal structure of a fragment of the SidE family member SdeA 
that retains ubiquitination activity, and determine the mechanism 
of this unique post-translational modification. The structure reveals 
that the catalytic module contains two distinct functional units: a 
phosphodiesterase domain and a mono-ADP-ribosyltransferase 
domain. Biochemical analysis shows that the mono-ADP- 
ribosyltransferase domain-mediated conversion of Ub to ADP- 
ribosylated Ub (ADPR-Ub) and the phosphodiesterase domain- 
mediated ligation of PR-Ub to substrates are two independent 
activities of SdeA. Furthermore, we present two crystal structures 
of a homologous phosphodiesterase domain from the SidE family 
member SdeD” in complexes with Ub and ADPR-Ub. The structures 
suggest a mechanism for how SdeA processes ADPR-Ub to PR-Ub 
and AMP, and conjugates PR-Ub to a serine residue in substrates. 
Our study establishes the molecular mechanism of phosphoribosyl- 
linked ubiquitination and will enable future studies of this unusual 
type of ubiquitination in eukaryotes. 

A variety of microbial pathogens exploit the eukaryotic ubiquitina- 
tion pathway during their respective infections!°'!. The intracellular 
pathogen L. pneumophila injects more than 300 effectors into host cells 
during its infection, including at least ten proteins that are involved in 
ubiquitin manipulation”. These effectors include HECT-like!*“ and 
F- or U-box-containing Ub ligases'>-'* as well as novel Ub ligases of 
the SidE family, such as SdeA, that act independently of canonical El 
and E2 enzymes®*. SdeA first uses its mono-ADP-ribosyltransferase 
(mART) activity to catalyse the transfer of ADP-ribose from NAD* 
to the side chain of R42 on Ub to generate ADPR-Ub. Subsequently, 
SdeA uses its phosphodiesterase (PDE) activity to catalyse the con- 
jugation of ADPR-Ub to a serine residue on substrates to generate a 
protein-PR-Ub product. Alternatively, in the absence of substrates, the 
SdeA PDE domain will catalyse the hydrolysis of ADPR-Ub to gener- 
ate PR-Ub and AMP (Fig. 1a, Extended Data Fig. 1). The molecular 
mechanism of this unique ubiquitination pathway is still unknown. 

To determine the mechanism of phosphoribosyl-linked ubiquitina- 
tion, we determined the crystal structure of a portion of SdeA (amino 
acids 211-910, hereafter called SdeA-core; Extended Data Table 1). The 
structure is composed of two distinct domains, the PDE and mART 
domains (Fig. 1b, c). A calculation of the surface electrostatic potential 
revealed no notably charged areas on the surface of SdeA other than a 
deep and highly positively charged groove on the PDE domain (Fig. 1d, e). 
Analogous to other PDEs", the active site is likely to be harboured 
in this deep groove (Extended Data Fig. 2a—c). Indeed, a sequence 


alignment of PDE domains showed that most of the conserved residues 
reside in this groove, consistent with their forming the PDE active site 
(Extended Data Figs. 2d, 3). The mART domain is composed of two 
lobes, an N-terminal a-helical lobe (amino acids 592-758) and a main 
lobe (amino acids 759-911). The main lobe contains a 8-sandwich core 
and harbours the three catalytic motifs: the (F/Y)-(R/H), STS and EXE 
motifs (Extended Data Figs. 4a—f, 5) that are conserved in other mART 
proteins, such as the Pseudomonas syringae effector HopU1 and the 
Clostridium perfringens iota-toxin””*”. A structural comparison of 
the a-helical lobe with its counterparts in other mARTs revealed that 
although the total number and the length of a-helices are variable, 
three «-helices form a structural core that is conserved in most mART 
proteins (Extended Data Fig. 4g-i). Although it packs in close contact 
with the main lobe in other mARTs, the a-helical lobe is extended 
away from the main lobe in our SdeA-core crystal structure (Extended 
Data Fig. 6a, b). The extended conformation observed in our crystal 
structure is consistent with the conformation in solution as detected 
by small-angle X-ray scattering (SAXS) and does not change in the 
presence of NAD* (Extended Data Fig. 6c-f). However, the «-helical 
lobe adopts a closed conformation and mediates contact with NAD* in 
a structure of iota-toxin!. Moreover, the a-helical lobe is enriched with 
highly conserved residues (including N723, Q727 and R729) that form 
a cluster on its surface, as revealed by an analysis of surface residue con- 
servation using the ConSurf server’ (Extended Data Figs. 5, 7a). Thus, 
we hypothesized that the a-helical lobe may have a similar role in SdeA 
catalysis. Indeed, an a-helical lobe deletion in SdeA (SdeA-Aa-lobe), 
as well as N723A, Q727A or R729A point mutations in the a-helical 
lobe completely abrogated ADP-ribosylation activity (Extended Data 
Fig. 7b, c). A mutation in a residue that is not conserved but is close 
to the conserved surface patch (F719A), yielded a substantial impair- 
ment of activity, whereas mutation of a conserved residue that is away 
from the patch (D622A) resulted in an activity level comparable to 
wild-type SdeA. Taken together, our data show that the a-helical lobe 
is crucial for ADP-ribosylation of Ub, and that a surface patch com- 
posed of highly conserved residues may mediate the binding of NADt 
during catalysis. These observations further suggest that the closed 
conformation of the «-helical lobe is required for the mART activity 
of SdeA. An accompanying paper describing the crystal structure ofa 
longer construct of SdeA in complex with both NAD* and Ub reports 
that the «-helical lobe is indeed observed in a closed conformation”*. 
The main lobe of the mART domain is packed against the PDE 
domain in the SdeA structure. The two catalytic sites face in oppo- 
site directions and are separated by a distance of over 55 A (Fig. 1b), 
which raises the question of how the activities of the two domains are 
coordinated. To address this question, we performed assays with SdeA 
fragments that retain only MART or PDE activity (Fig. 2a). Similar 
to wild-type SdeA-core, reactions that contain both SdeA-PDE and 
SdeA-mART efficiently generate PR-Ub and ubiquitinate the substrate 
RAB33B (Fig. 2b, c). SdeA-core carrying a mutation (H277A) in the 
PDE active site retained the ability to generate ADPR-Ub but failed 
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Fig. 1 | Overall structure of SdeA. a, Schematic of the phosphoribosyl- 
ubiquitination reaction. b, Overall structure of SdeA-core in ribbon 
representation. This portion of SdeA has two distinct domains: the PDE 
(green) and mART (gold) domains. The active site residues of both the 
mART and PDE domains are shown as red spheres. The linear distance 
between these two active sites is approximately 55 A. c, An orthogonal view 
of b. d, Molecular surface model of SdeA. The surface is coloured on the 
basis of electrostatic potential with positively charged regions in blue and 
negatively charged surfaces in red. The orientation of the molecule is the 
same as shown in b. e, An orthogonal view of d. 


to process ADPR-Ub to PR-Ub or to ubiquitinate RAB33B. However, 
the presence of both SdeA-core!”” and SdeA-PDE successfully cata- 
lysed both the production of PR-Ub and the ubiquitination of RAB33B. 
Moreover, SdeA-PDE alone can catalyse phosphoribosyl-linked ubig- 
uitination of RAB33B when purified ADPR-Ub is supplied (Fig. 2d). 
The independence of the two activities was further validated by SdeA- 
mediated RAB33B ubiquitination when the PDE and mART domains 
were co-expressed in cells (Fig. 2e). These results suggest that ADP- 
ribosylation of Ub and phosphoribosyl-linked ubiquitination of serine 
are mechanistically and spatially independent activities performed by 
a single protein. 

Despite sharing 23% sequence similarity with a well-characterized 
cyclic di-3’,5’-GMP phosphodiesterase in Pseudomonas aeruginosa 
PA478 17°, the PDE domain of SdeA uses ADPR-Ub as its substrate 
and catalyses the unprecedented phosphoribosyl-linked ubiquitination 
of serine. To understand how ADPR-Ub is recognized by the SdeA PDE 
domain, we assessed the interaction of Ub and several homologous PDE 
domains from the Legionella SidE-effector family using 'H-'°N HSQC 
TROSY (heteronuclear single quantum coherence, transverse relaxa- 
tion-optimized spectroscopy) NMR titration experiments (Extended 
Data Fig. 8a—-c). The SdeA PDE domain showed no detectable inter- 
action with Ub in solution, whereas the PDE domain of another SidE 
family member, SdeD, exhibited a direct and specific interaction with 
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Ub as evidenced by NMR-peak perturbations. We then successfully 
determined the structures of SdeD, both on its own and in complex 
with Ub (Extended Data Fig. 8d-f). Notably, two Ub molecules are in 
contact with a single PDE domain in the crystal. One Ub (Ub2) binds 
on the opposite side to the catalytic groove, making the physiological 
significance of this binding mode unclear (Extended Data Fig. 8g). The 
other Ub (Ub1) binds to a flat surface at the opening of the catalytic 
groove (Fig. 3a). Similar to the Ub surface area mapped by NMR titra- 
tion experiments in solution (Extended Data Fig. 8c), three regions of 
Ub1 contact the PDE domain: the loop region around residue T9, the 
C terminus and a region that includes R42 (Fig. 3a). At the T9 loop 
region, in addition to the hydrophobic interactions mainly contributed 
by L8, residue K6 of Ub1 forms electrostatic interactions with E251 on 
SdeD (Fig. 3b). At the C terminus of Ub1, in addition to hydrophobic 
interactions mediated by L73, R72 of Ub1 forms salt bridges with E242 
on SdeD (Fig. 3c). Notably, the R42 side chain of Ub1 extends into the 
catalytic groove and forms hydrogen bonds and electrostatic interac- 
tions with the conserved residues Q52 and E126 at the PDE catalytic 
site (Fig. 3d). To test whether the PDE domain of SdeA interacts with 
Ub ina manner that is similar to SdeD, we modelled Ub binding by the 
PDE domain of SdeA on the basis of the SdeD-Ub1 complex (Fig. 3e). 
The model predicts that E465 and E454 in SdeA would have analogous 
roles in Ub binding to E251 and E242 in SdeD, respectively (Fig. 3a, e). 
Consistent with this prediction, PDE activity was substantially impaired 
in SdeA E465A and E454A mutants as evidenced by the marked reduc- 
tion of both the Pro-Q staining signal and ubiquitination of RAB33B 
(Fig. 3f, g). In addition, a V414Y mutant designed to sterically block 
the access of ADPR-Ub to the catalytic site also largely impaired the 
PDE activity (Fig. 3e-g). All three SdeA mutants were able to cause a 
band shift of Ub on native gels (Fig. 3e, top) indicating that the mART 
activity of these mutants remained intact. Together, these data support 
the notion that the SdeA PDE domain recognises Ub in a manner that 
is similar to the strategy observed for SdeD, although the interaction is 
markedly weaker as evidenced by the NMR-titration analysis. 

To further address the question of how the ADPR moiety of ADPR-Ub 
fits in the active-site groove of the PDE domain, we determined the 
structure of a catalytically inactive SdeD mutant (H67A) in complex 
with ADPR-Ub. The binding mode of ADPR-Ub is similar to Ub1 
with the ADPR moiety nestled in the catalytic groove (Extended Data 
Fig. 9a-d). ADPR sits atop several invariant residues, including H67A, 
H189 and E126, and engages in extensive interactions, with a large 
number of conserved residues within the catalytic groove (Fig. 4a-c, 
Extended Data Fig. 9e). To test the role of the ADPR-interacting res- 
idues within the catalytic groove, we mutated several corresponding 
residues in SdeA. PDE activity was completely abolished in the H277A, 
H407A, and E340A mutants, as indicated by the lack of both the Pro-Q 
staining signal and RAB33B ubiquitination (Fig. 4c, d). The activity of 
the R413A mutant was substantially impaired, whereas H281A and 
W394A mutations showed little or no effect on PDE activity. 

Based on our results, we propose a two-step reaction mechanism 
for the transfer of PR-Ub to a substrate (Fig. 4e). In the first step, 
negatively-charged E340 helps to position R42 of ADPR-Ub and H277. 
This interaction could enhance the nucleophilicity of H277 through 
induction. H277 attacks the 3-phosphate of ADPR to form a transient 
phosphoramidate bond with PR-Ub. The presence of this transient 
intermediate is supported by biochemical evidence reported in an 
accompanying paper”*. The nearby H407 residue functions as a gen- 
eral acid to donate a proton to the a-phosphate of the releasing AMP 
molecule. The underlying mechanism of this step is similar to that of 
histidine protein kinases*””*. In the second step, H407 deprotonates 
the hydroxyl group of a serine residue on the approaching substrate. 
The activated hydroxyl group then attacks the phosphoryl] group to 
form a stable phosphoserine linkage between the substrate protein 
and PR-Ub. The protonated E340 then functions as a general acid to 
protonate H277, thereby regenerating the enzyme to its initial state. 
Alternatively, if a water molecule serves as the Ub acceptor in the sec- 
ond step, the reaction results in the cleavage of ADPR-Ub to PR-Ub. 
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Fig. 2 | ADP-ribosylation of Ub and phosphoribosyl-linked 
ubiquitination of serine are two independent activities of SdeA. 

a, Schematic of SdeA constructs. SdeA has an N-terminal deubiquitinase 
(DUB) domain, followed by PDE, mART and C-terminal coiled-coil 
(CC) domains. b, In vitro Ub-modification assays. The modification of 
Ub to ADPR-Ub or PR-Ub was monitored by the band-shift of Ub in 
native PAGE with Comassie staining (top). The production of PR-Ub 
was visualized by SDS-PAGE and phosphoprotein staining with Pro-Q 
Diamond (bottom). ADPR-Ub and PR-Ub migrate at the same position 


Modification of Ub to yield PR-Ub has not, to our knowledge, been 


reported in (non-infected) eukaryotes. However, many Legionella 
effector proteins have eukaryotic origins evolutionarily”’, raising the 


GFP > 


on a native gel (labelled as modified Ub), however, only PR-Ub is visible 
by Pro-Q phosphoprotein stain. c, In vitro phosphoribosyl-ubiquitination 
assay of RAB33B by indicated the SdeA proteins. IB, immunoblot. d, In 
vitro phosphoribosyl-ubiquitination assay of RAB33B in the presence 

of purified ADPR-Ub. e, Intracellular-ubiquitination assays of RAB33B 
by SdeA. Data shown in b-d are representative of four independent 
experiments. GFP, green fluorescent protein. e, Similar results were 
obtained from three independent experiments. b-e, Uncropped gels and 
blots are shown in Supplementary Fig. 1. 


possibility that eukaryotes also harbour an equivalent machinery that 
may be encoded in multiple polypeptides, as the mART and PDE 
activities are functionally independent. Future investigation of such 
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Fig. 3 | The interaction between Ub and the PDE domains of SdeD and 
SdeA. a, Overall view of the binding of Ub (Ub1) with the PDE domain 
of SdeD. The PDE domain residues within Van der Waals distance of Ub1 
are coloured in light blue. Three interacting regions of Ub1 that contact 
SdeD are marked by dashed outlines. b-d, Expanded views of the three 
Ub1-SdeD interacting regions outlined in a. e, Surface representation 

of the PDE domain of SdeA. Ub-binding was modelled the SdeD-Ub1 
complex structure and the potential Ub-interacting surface is highlighted 


in dark green. Three key residues (E465, E454 and V414) at the potential 
Ub-interacting interface are shown in stick representation. The PDE active 
site is shown in red. f, g, In vitro Ub-modification (f) and phosphoribosyl- 
ubiquitination assays (g) of SdeA mutants at the potential Ub interacting 
interface. The modification of Ub and phosphoribosyl-linked 
ubiquitination were monitored as described in Fig. 2b, c. Data shown in f 
and g are representative of four independent experiments. Uncropped gels 
and blots are shown in Supplementary Fig. 1. WT, wild type. 
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Fig. 4 | Structure of the complex formed by ADPR-Ub and the PDE 
domain of SdeD. a, Surface representation of ADPR-Ub (cyan) in 
complex with the SdeD PDE domain (grey). The catalytic site is coloured 
in red. The ADPR moiety is coloured in light green and shown in stick 
(left) and surface (right) representation. b, A detailed interaction of the 
ADPR moiety with residues of the PDE domain. SdeD residues involved 
in ADPR-binding are labelled and the corresponding residues in SdeA 
are labelled in parentheses. In the structure, H67 is substituted with 


a eukaryotic enzyme system will advance our understanding of the 
versatile Ub code. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and investigators were not blinded to allocation during 
experiments and outcome assessment. 

Cloning and mutagenesis. DNA fragments encoding the SdeA-core and 
SdeD(A1-341) were amplified from L. pneumophila genomic DNA. The PCR 
products were digested with BamHI and Xhol restriction enzymes and inserted 
into a pET28a-based vector in-frame with an N-terminal 6 x His-SUMO tag for 
protein overexpression in bacteria cells. Amino acid substitutions of SdeA and 
SdeD were introduced by site-directed mutagenesis using oligonucleotide primer 
pairs containing the appropriate base changes. The Ub gene was subcloned into a 
pET21a vector. All constructs were confirmed by DNA sequencing. 

Protein expression and purification. Relevant plasmids (containing Legionella 
protein constructs or RAB33B) were transformed into E. coli BL21(DE3) cells. 
Cultures derived from single colonies were grown in Luria-Bertani medium sup- 
plemented with 50g ml! kanamycin or 100,1g ml“! ampicillin to mid-log phase. 
Protein expression was induced with 0.1 mM isopropyl-3-p-thiogalactopyrano- 
side (IPTG) for 12h at 18°C. Collected cells were resuspended in a lysis buffer 
containing 20 mM Tris-HCl (pH 8.0) and 150 mM NaCl and lysed by sonication. 
Insoluble cellular debris was pelleted by centrifugation at 31,000g for 30 min at 
4°C, and the clarified lysate was incubated with cobalt resin (Gold-Bio) for 1.5h 
at 4°C. Proteins bound to the resin were extensively washed with lysis buffer. The 
SUMO-specific protease Ulp1 was then added to the resin slurry to release the 
expressed protein from the His-SUMO tag. Eluted protein samples were further 
purified by fast protein liquid chromatography (Superdex 16/60, GE Lifesciences) 
in 150mM NaCl, 20 mM Tris pH 7.5. Peak fractions were collected, pooled and 
concentrated. Protocols for Ub expression and purification were adapted from the 
published literature*”. In brief, collected cells were resuspended in 20 mM ammo- 
nium acetate, pH 5.1. Cells were lysed by sonication and cell lysate was clarified 
by centrifugation (31,000g for 30 min). The pH of the clarified lysate was lowered 
to 4.8 using glacial acetic acid. The decrease in pH caused the lysate to turn milky 
white (a result of precipitated proteins), and the solution was again centrifuged 
at 31,000g for 30 min at 4°C to remove the precipitated protein fraction. The pH 
of the remaining soluble fraction was adjusted to 5.1 by the addition of NaOH. 
The soluble fraction was then loaded onto a HiTrap SP cation exchange column 
(GE Healthcare) in 20 mM ammonium acetate pH 5.1, and eluted in a continuous 
gradient of 500 mM ammonium acetate pH 5.1. Fractions containing the ubiquitin 
peak were pooled and further purified using size exclusion chromatography in 
150mM NaCl, 20 mM Tris pH 7.5. Ubiquitin-containing fractions were pooled 
and concentrated. 

To generate ADPR-Ub for both biochemical assays and crystallographic trials, 

141M SdeA-core!?774 (which lacks PDE activity) was incubated with 251M Ub 
and 1mM NAD* for 1h at 37°C. ADPR-Ub was purified by size exclusion chro- 
matography in 150mM NaCl, 20mM Tris (pH 7.5). 
Protein crystallization. Generally, all protein crystallization screens were per- 
formed with a Crystal Phoenix liquid handling robot (Art Robbins Instruments) at 
room temperature. The crystallization conditions, which yielded the initial crystals 
from the screen, were further optimized using the hanging-drop vapour diffusion 
method by mixing 1.51] of protein with an equal volume of reservoir solution. 

Specifically, for SdeA-core crystallization, SdeA-core protein was concentrated 
to 12mg ml“! and crystallized in 100 mM HEPES pH 7.9, 12% PEG 8000. Thin 
plate-shaped crystals appeared in about two weeks. For SdeD crystallization, 
SdeD was concentrated to 14mg ml“ and crystallized in 200 mM CaCl, 100mM 
MES pH 5.5, 18% PEG 6000, and 100mM DTT. Cube-shaped crystals formed 
within two to three days. To generate the SdeD-Ub crystals, SdeD(A1-341) was 
mixed with wild-type Ub at a 1:5 molar ratio, with a final SdeD concentration of 
8 mg ml!. Rod-shaped crystals formed in 200 mM NaCl, 100 mM imidazole 
pH 7.0, and 24% PEG 8000. 

We also obtained crystals of a catalytically inactive SdeD"°”“ mutant with 
purified ADPR-Ub. However, those crystals diffracted poorly (probably owing 
to conflicting crystal packing contacts mediated by the ADPR moiety at the Ub2 
site). We therefore attempted to crystallize the SdeD PDE domain with a mixture 
of ADPR-Ub and unmodified Ub in a 1:2:3 molar ratio and a final SdeD con- 
centration of 12mg ml~!. We expected ADPR-Ub to have a higher affinity for 
binding at the Ub1 site, allowing unmodified Ub to bind to the Ub2 site to satisfy 
crystal packing constraints. Rod-shaped crystals appeared in one day in a solution 
containing 100mM sodium cacodylate pH 6.7 and 21% PEG 8000. This strategy 
yielded diffraction quality crystals in which ADPR-Ub is bound at the Ub1 site 
and unmodified Ub bound at the Ub? site. 

X-ray diffraction data collection and processing. Diffraction datasets for SdeA- 
core, the SdeD-Ub complex, and the SdeD-Ub-ADPR-Ub complex were collected 
at Cornell synchrotron light source MacCHESS beamline F1 and datasets for SdeD 
crystals were collected at the Al beamline. Before data collection, all crystals were 
soaked in cryoprotectant solutions containing their respective crystallization con- 


dition buffer supplemented with 20% glycerol and flash frozen in a stream of liquid 
nitrogen. All datasets were indexed, integrated and scaled with HKL-2000"". 
Structure determination and refinement. The structure of SdeA-core was solved 
using the single wavelength anomalous dispersion (SAD) method. Before data 
collection, SdeA-core crystals were soaked in cryoprotectant (0.1 M HEPES pH 7.9, 
12% PEG 8000, and 25% (v/v) glycerol) with the addition of 10 mM ethylmercury 
chloride for 5 min at room temperature. Heavy atom sites were determined and 
the initial phase was calculated using the program HKL2MAP™. The structure 
of the PDE domain of SdeD was solved by SAD phasing with selenomethionine- 
incorporated SdeD crystals. The structures of the SdeD-Ub and SdeD-Ub-ADPR-Ub 
complexes were solved by molecular replacement with the AMoRe program*® of 
the CCP4 suite*4, using the apo SdeD structure as the search model. For all datasets, 
iterative cycles of model building and refinement were carried out with Coot®* and 
refmac5* of the CCP4 suite. 

NMR titration analysis. All NMR spectra were collected on a Bruker 500 MHz 
DMxX at 25°C. Data were processed using NMRPipe*’ and analysed using 
NMRView]*®. NMR samples were prepared in 25mM NaPi, 150 mM NaCl buffer at 
pH 7.0 with 5% (v/v) DO. For all NMR experiments, the concentration of "N-Ub 
or ADPR-Ub (in ADPR-Ub only the Ub subunit was isotopically labelled) was 
maintained at 150|1M. Concentrations of other protein components varied from 
35-300 |1M. Two independent experiments were collected for the !"N-Ub + SdeA 
PDE domain complex. Each experiment used different stocks of Ub and PDE. 
Four separate samples containing Ub and different concentrations of SdeD were 
prepared to collect spectra monitoring the interaction between SdeD and Ub 
(Ub = 150M; SdeD = 37.5, 75, 150 and 300,1M). 

SAXS data collection. SAXS experiments were performed on beamline 4-2 at the 
Stanford Synchrotron Radiation Lightsource (SSRL)*?. Concentrated SdeA-core 
protein samples were buffer exchanged into 20mM HEPES pH 7.5, 150mM NaCl, 
and stored at 4°C before data collection. Fifty microlitres of SdeA-core (7 mg ml!) 
were injected onto a Superdex 200 Increase PC 3.2/30 (GE Healthcare) column in 
buffer containing 20 mM HEPES pH 7.5, 150mM NaCl, 5mM DTT, 0.02% NaN3, 
with a flow rate of 0.05 ml min“! for online SEC-SAXS. Data were collected using 
a Pilatus3 x 1M detector with a 2.5m sample-to-detector distance and X-ray beam 
energy of 12.4keV (wavelength, \= 1 A), with 1-s exposures collected every 5s. 
The first 100 images were averaged as buffer scattering data and subtracted from 
the corresponding protein scattering data. SAXS patterns, the radius of gyration 
(Rg), the maximal particle dimension (Dax), and the pairwise distance distribution 
histogram (P(r) plot) and Kratky plot were analysed using the ATSAS software 
suite’. The AllosMod-FOXS server was used for the comparison of solution and 
X-ray structure conformations*?”. The X-ray-determined ‘open’ structure and 
modelled ‘closed’ conformation were used as input structures. AllosMod generated 
one hundred static structures, using MODELLER”, which were similar to the 
input X-ray determined (open) or modelled (closed) structures of SdeA-core*. 
Theoretical SAXS profiles were calculated and compared against the raw SAXS 
data using FOXS rigid-body modelling as previously described", with a maximal 
q value of 0.25. The mean and s.d. in x? amongst the five best-fitting models were 
examined for fit comparisons. 

Computational analysis and graphical presentation of protein sequence and 
structure. Sequences homologous to SdeA were selected from results gener- 
ated by the BLAST server (NCBI). Edited sequences were aligned with Clustal 
Omega“ and coloured using the Multiple Align Show online server (http://www. 
bioinformatics.org/sms/index.html). Protein surface conservation was calculated 
using the online ConSurf server (http://consurf.tau.ac.il)**. All structural figures 
were generated using PyYMOL (The PyMOL Molecular Graphics System, v.1.8, 
Schrédinger, LLC) except for the difference Fourier electron density map figure 
(Extended Data Fig. 9e), which was generated in Coot. The electrostatic surface 
potential is calculated using the APBS program (http://www.poissonboltzmann. 
org). The surface is coloured on the basis of electrostatic potential with positively 
charged regions in blue (+4kcal per electron) and negatively charged surfaces in 
red (—4kcal per electron). 

Ubiquitin-modification and RAB33B-ubiquitination assays. Ub-modification 
reactions were carried out by mixing 1}1M of SdeA-core or SdeA- 
mART(A563-910) with 25 1M ubiquitin in a reaction buffer containing 50 mM 
NaCl and 50 mM Tris pH 7.5, in the presence or absence of 1!mM NAD*. The 
reactions were incubated for 1h at 37°C and reaction products were assessed 
using both 8% native PAGE and 12% SDS-PAGE. Native gels were stained with 
Coomassie and SDS-PAGE gels were stained with Pro-Q Diamond phosphopro- 
tein stain (Invitrogen) to assay for PDE activity. ADPR-Ub and PR-Ub migrate to 
the same position on a native gel (labelled as modified Ub), however, only PR-Ub 
is visible by Pro-Q phosphoprotein stain owing to its free phosphoryl group*. 
RAB33B ubiquitination reactions were performed with the addition of 4M of 
recombinant Flag~RAB33B to the Ub modification reaction described above. The 
reaction products were analysed using SDS-PAGE and a western blot with an 
anti-Flag antibody (Sigma-Aldrich) at a 1:2,500 dilution. To perform the intra- 
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cellular phosphoribosyl-ubiquitination assay of RAB33B, plasmids expressing 
Flag-RAB33B, GFP alone or the indicated GFP-tagged SdeA were co-transfected 
in NIH HEK293T cells. Whole cell lysates were subjected to immunoprecipitation 
with Flag beads and the products were analysed using anti-Flag western blot. The 
expression of GFP-SdeA constructs was analysed with an anti-GFP western blot. 
Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. Atomic coordinates and structure factors for the reported struc- 
tures have been deposited into the Protein Data Bank under the accession codes 
6B7Q (Hg-bound SdeA), 6B7P (Se-SdeD), 6B7M (SdeD-Ub) and 6B70O (SdeD- 
Ub-ADPR-Ub). The data supporting the findings of the study are available within 
the paper and the Extended Data figures and tables. Further data are available from 
the corresponding author upon reasonable request. The raw images of electro- 
phoreses and western blots can be found in Supplementary Fig. 1. 
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Extended Data Fig. 1 | Chemical structure of phosphoribosyl-linked 
ubiquitination catalysed by SdeA. Phosphoribosyl-linked ubiquitination 
catalysed by SdeA involves two enzymatic activities of SdeA. First, using 
its mART activity, SdeA catalyses the ADP-ribosylation of Ub to generate 


ADPR-Ub by consuming an NAD* molecule. Second, SdeA catalyses 


AMP 


the conjugation of ADPR-Ub to a serine residue of substrate proteins 
via its PDE activity to generate protein-PR-Ub and AMP. In the absence 
of substrate proteins, the PDE domain of SdeA can simply hydrolyse 
ADPR-Ub to PR-Ub and AMP using a water molecule. 
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yn groove 


Extended Data Fig. 2 | Structure of the PDE domain of SdeA. a, Model 
of the PDE domain of SdeA in ribbon representation. Two invariable 
histidine residues (H277 and H407) are shown in stick representation and 
labelled. b, Surface representation of the PDE domain. The two invariable 
histidine residues (shown in red) are situated at the bottom of a deep 
groove. c, The PDE domain from a Legionella effector (Ipg1496). Notably 
the all a-helical structural core of the PDE domains is easy to superimpose 
onto that of SdeA with a root mean square deviation (r.m.s.d.) of 1L9A 
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Catalytic groove 


Conserved 


Variable 


over 225 aligned Ca atoms. A prominent difference between the two PDE 
domains is that some loops (indicated by dashed outlines) connecting 

the a-helices vary both in primary sequence and in length (Extended 
Data Fig. 3). d, Surface residue conservation analysis of the PDE domain. 
The conservation is calculated using the ConSurf server with the most 
conserved residues coloured in purple and the least conserved residues in 
cyan. Note that the catalytic groove is enriched with the most conserved 
residues. 
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Extended Data Fig. 3 | Multiple sequence alignment of selected PDE E340) are highlighted with red stars below the sequences. SdeD residues 
domains from the SidE family effectors. Representative sequences that are in close contact with Ub1 (Fig. 3a) are marked by blue triangles at 
corresponding to the PDE domain of SdeA (amino acids 222-502) were the bottom of the sequences and the predicted Ub1-interacting residues 
aligned using the MultAlin online server (http://www.bioinformatics. of the PDE domain of SdeA (Fig. 3e) are depicted by red triangles on the 
org/sms/index.html). Secondary structural elements are drawn above the top of the sequences. Amongst the potential Ub1-interacting residues, 
alignment. The numbering for the SdeA sequence is marked on the top V414, E454 and E465 of SdeA used in mutagenesis studies in Fig. 3f, g are 
of the alignment and the numbering for the SdeD sequence is marked marked with solid red triangles. Entrez database accession numbers are as 
below. Variable loop regions are outlined with dashed squares. Conserved follows: SdeA, GI: 1064303039; SidE, GI: 52840489; SdeB, GI: 52842367; 
residues located within the catalytic groove are highlighted with purple SdeC, GI: 52842370; lpg2154, GI: 52842368; and SdeD, GI: 52842717. 


dots. In particular, three essential catalytic residues (H277, H407 and 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


Extended Data Fig. 4 | Structural comparison of the SdeA mART 
domain with other mART domains from bacterial toxins. a, Model 
of the main lobe of the SdeA mART domain in ribbon representation. 
The main lobe is composed of two nearly perpendicular }-sheets 
forming a two-layered 8-sandwich core. Residues comprising the three 
mART catalytic signature motifs: (F/Y)-(R/H), STS and EXE motif are 
shown in sticks. b. HopU1 from P. syringae (PDB ID: 3U0J) in ribbon 
representation. c, Structural superimposition of the mART domains 
from SdeA (gold) and HopU1 (blue). d, Iota-toxin from C. perfringens 


(PDB ID: 4H03). e, Iota-toxin in complex with NAD* (red spheres). 

f, Structural overlay of the mART domains from SdeA (gold) and iota- 
toxin (cyan). g, A cartoon diagram of the a-helical lobe of the SdeA mART 
domain. The «-helical lobe consists of eight a-helices. Three structurally 
conserved a-helices (a6-8) are coloured in brown. h, A cartoon diagram 
of the a-helical lobe of HopU1, the three equivalent «-helices (4-6) are 
highlighted in blue. i, Structural overlay of the «-helical lobe of SdeA and 
HopU1. 
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Extended Data Fig. 5 | Multiple sequence alignment of the mART 
domains. Representative sequences corresponding to the mART domains 
of SdeA (amino acids 593-904) were aligned using MultAlin. Secondary 
structural elements (cyan for the «-helical lobe and gold for the main lobe 
of the mART domain) are drawn above the alignment. The numbering 
for the SdeA sequence is marked on the top of the alignment. Residues 
comprising the catalytically important (F/Y)-(R/H), STS and EXE motifs 
are marked with red stars. Residues in the a-helical lobe, which form—or 


are close to—the conserved surface patch and are essential for the mART 
activity (Extended Data Fig. 7), are marked with purple triangles. D622, 
which is conserved but has no effect on the mART activity is marked 
with a green triangle. Entrez database accession numbers are as follows: 
SdeA, GI: 1064303039; SidE, GI: 52840489; SdeB, GI: 52842367; SdeC, 
GI: 52842370; SidE Legionella cincinnatiensis, G1: 966421657; LLO_3095, 
GI: 489730495; SidE Legionella gratiana, GI: 966468332; SidE Legionella 
santicrucis, GI: 966496250; LLO_0424, GI: 502743808 
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Extended Data Fig. 6 | The c-helical lobe of SdeA mART domain 

has an extended conformation compared to other mART proteins. 

a, Structural superimposition of SdeA onto the HopU1 structure 
referenced on the main lobe of the mART domain. SdeA is coloured using 
the same scheme as Fig. 1b. The main lobe of HopU1 is coloured in blue 
and its a-helical lobe is in grey. The «-helical lobe of the SdeA mART 

is extended away from the main lobe whereas its counterpart in HopU1 
packs in close contact with the main lobe. b, Structural model of SdeA 
with the a-helical lobe in a closed conformation. The positioning of the 
a-helical lobe was based on a structural overlay of the three structurally 
conserved « helices identified in all mART domains (Extended Data 

Fig. 4g-i). c, Experimental and theoretical SAXS curves for SdeA-core 
and the resulting best-fit AllosMod structure for the determined structure 
(open) and modelled closed conformation, with residual plots shown 
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below. Best fit x? values are indicated. d, Overlay of the determined 
SdeA-core structure (PDE, green; mART main lobe and a-helical lobe, 
yellow) and best-fit AllosMod structures for the open (magenta) and 
closed (cyan) conformations. e, Summary of the experimentally derived 
SAXS parameters for SdeA-core, AllosMod derived best-fit R, and average 
FOXS x? for the five best-fitting AllosMod models compared to the 
experimental SAXS curve. The program Primus was used to calculate the 
radius of gyration (Rg) and maximum linear dimension (Dax). Kratky 
plot (I(q)q’ versus q), and distance-distribution plot P(r) obtained from 
GNOM are shown. f, Overlay of SdeA-core SAXS curves in the presence of 
4.7 mM NAD* (10x protein concentration), with corresponding Guinier 
R, values. Data shown in c, e and f are representative of two biologically 
independent experiments. 
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indispensable for Ub ADP-ribosylation. a, Surface representation of lobe. The reaction products were analysed using native PAGE with 
residue conservation of SdeA (the most conserved residues are shown Coomassie blue stain (top) and SDS-PAGE with Pro-Q phosphoprotein 
in purple and the least conserved residues in cyan). Surface residue stain (bottom). c, SDS-PAGE analysis of the proteins in the reaction 
conservation was calculated using the ConSurf server. An expanded view mixture. Data shown in b and c are representative of three independent 
of a surface cluster that consists of the most conserved residues on the experiments. Uncropped gels are shown in Supplementary Fig. 1. 


a-helical lobe is shown on the right. b, Analysis of in vitro ubiquitin- 
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Extended Data Fig. 8 | The interaction between Ub and the SdeD PDE 
domain. a, NMR 'H-!°N HSQC TROSY spectral overlay of 1501M Ub 
(black) in the presence or absence of 300 1M SdeA PDE domain (cyan). 
Ub binds very weakly to SdeA as manifested by minimal changes in 

15NH peaks of Ub. b. Spectral overlay of 150|1M Ub (black) with 75 1M 
SdeD PDE. Ub binds with higher affinity to SdeD as evidenced by peak 
broadening and/or disappearance of Ub resonances. c, Residues whose 
resonances are most affected by the presence of SdeD are mapped in red 
on a cartoon structure of Ub. d, PDE domain of SdeD (grey) shown in 
ribbon representation. Two invariable histidine residues (H67 and H189) 
are shown in stick representation (cyan). The variable loop unique to SdeD 
is outlined. e, Structural overlay of the PDE domain of SdeD (grey) and 
the PDE domain of SdeA (green). The overall structures of these two PDE 
domains are very similar with an r.m.s.d. of 1.73 A over 251 overlaid Ca 


PDE 


atoms. f, Two orthogonal views of the SdeD PDE domain in complex with 
two Ub molecules in ribbon representation: Ub1 (cyan) and Ub2 (blue). 
Ub1 binds at the opening of the PDE catalytic groove with its R42 side 
chain sticking into the groove. Ub2 binds a region on the opposite side of 
the catalytic groove. g, Structural superimposition of SdeA onto the SdeD 
PDE-Ub complex referenced on the PDE domain. The PDE domain of 
SdeA is shown in green and the mART domain is shown in gold. Note 
that Ub1 shows no conflicting contacts against the superimposed SdeA 
molecule whereas the Ub2 binding site largely overlaps with the space 
occupied by the mART domain in SdeA. This analysis suggests that the 
binding of the PDE domain of SdeD to Ub1 is probably applicable to the 
PDE domain of SdeA; however, the second Ub-binding site observed 

in SdeD might not exist in SdeA. Experiments in a and b were repeated 
independently two times. 
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Extended Data Fig. 9 | Crystal structure of the PDE domain of SdeD 

in complex with ADPR-Ub and Ub. a, SdeD PDE domain H67A mutant 
in complex with both ADPR-Ub and unmodified Ub. The crystal was 
obtained by mixing the SdeD PDE H67A mutant, ADPR-Ub, and Ub ina 
1:2:3 molar ratio (see the ‘Protein crystallization’ section of the Methods 
for details). The PDE domain is shown in grey, the bound ADPR-Ub is 
shown in cyan and the unmodified Ub is shown in blue. The unmodified 
Ub binds a region identical to Ub2 found in the SdeD-Ub complex shown 


in Extended Data Fig. 7d. ADPR-Ub binds in a mode that is similar to that 
of Ub1 in the SdeD-Ub complex with the ADPR moiety fitting into the 
catalytic groove. b, An orthogonal view of a. c, d, Two orthogonal views 
of the complex shown in a in surface representation. Note that the ADPR- 
moiety shown in light green fits deeply into the catalytic groove. e, The 
density was generated by refinement against the structural model without 
the ADPR portion. The F, — F, difference map is shown in green and 
contoured at lo. 
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MCCHESS F1 


0.9789 
P21 
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61,395 
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3.4(3.3) 


7.98 (0.87) 
0.122(0.759) 


80.51(2.20) 
0.192/0.241 


5338 
10 
170 


49.728 
61.563 
46.001 
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2.29 


«Values in parentheses are for the highest-resolution shell. 


Extended Data Table 1 | X-ray data collection and structural refinement statistics. 
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Insights into catalysis and function of 
phosphoribosyl-linked serine ubiquitination 


Sissy Kalayil!?5, Sagar Bhogaraju!*°, Florian Bonn!, Donghyuk Shin!, Yaobin Liu!, Ninghai Gan, Jérome Basquin‘, 


Paolo Grumati!, Zhao-Qing Luo? & Ivan Dikic)?* 


Conventional ubiquitination regulates key cellular processes by 
catalysing the ATP-dependent formation of an isopeptide bond 
between ubiquitin (Ub) and primary amines in substrate proteins'. 
Recently, the SidE family of bacterial effector proteins (SdeA, 
SdeB, SdeC and SidE) from pathogenic Legionella pneumophila 
were shown to use NAD* to mediate phosphoribosyl-linked 
ubiquitination of serine residues in host proteins” *. However, the 
molecular architecture of the catalytic platform that enables this 
complex multistep process remains unknown. Here we describe 
the structure of the catalytic core of SdeA, comprising mono-ADP- 
ribosyltransferase (mART) and phosphodiesterase (PDE) domains, 
and shed light on the activity of two distinct catalytic sites for 
serine ubiquitination. The mART catalytic site is composed of an 
a-helical lobe (AHL) that, together with the mART core, creates 
a chamber for NAD* binding and ADP-ribosylation of ubiquitin. 
The catalytic site in the PDE domain cleaves ADP-ribosylated 
ubiquitin to phosphoribosyl ubiquitin (PR-Ub) and mediates 
a two-step PR-Ub transfer reaction: first to a catalytic histidine 
277 (forming a transient SdeA H277-PR-Ub intermediate) and 
subsequently to a serine residue in host proteins. Structural analysis 
revealed a substrate binding cleft in the PDE domain, juxtaposed 
with the catalytic site, that is essential for positioning serines for 


ubiquitination. Using degenerate substrate peptides and newly 
identified ubiquitination sites in RTN4B, we show that disordered 
polypeptides with hydrophobic residues surrounding the target 
serine residues are preferred substrates for SdeA ubiquitination. 
Infection studies with L. pneumophila expressing substrate-binding 
mutants of SdeA revealed that substrate ubiquitination, rather 
than modification of the cellular ubiquitin pool, determines the 
pathophysiological effect of SdeA during acute bacterial infection. 

To understand the mode of ubiquitination by SdeA, we sought 
structural insights into the function of this enzyme. First, we iden- 
tified SdeA residues 213 to 907 (SdeA213-907), comprising both PDE 
and mART domains, as the minimal stable fragment that can ubiquit- 
inate the known SdeA substrate Rab33b~ 3, albeit less efficiently than 
full-length SdeA (SdeApgr) (Extended Data Fig. 1a, b). We crystallized 
SdeA> 13-997 and determined its structure at 2.8 A (Supplementary 
Table 1, Supplementary Information). In the structure, each asym- 
metric unit contained one molecule of SdeA213-997 comprising three 
distinct domains (Fig. la). The PDE domain spans residues 222-593 
and is «helical. Structure comparison analysis revealed that the PDE 
domain of SdeA is most similar to that of the Legionella effector protein 
Ipg1496 (PDB: 5BU2) (root mean squared deviation (1.m.s.d.) of 2.3 A 
over 239 Ca atoms)". The closest structural mammalian homologue 
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Fig. 2 | Catalytic mechanism of SdeA PDE domain. a, Active site of SdeA 
PDE domain depicting residues that are important for catalysis. Distances 
between important catalytic residues are indicated in Angstroms. b, In 
vitro ubiquitination assays with PDE histidine mutations. This experiment 
was repeated independently three times with similar results. c, Stabilized 
intermediate was analysed by targeted liquid chromatography with tandem 
mass spectrometry (LC-MS/MS) after tryptic digestion. We used low- 
energy HCD to target the phosphoramidate bond specifically for partial 


of the SdeA PDE domain is human SAMHD1, a dNTP hydrolase with 
functions in the innate immune response (r.m.s.d. of 4.1 A over 165 
Ca atoms)°*. The mART domain is situated at the C terminus (res- 
idues 594-907) and comprises two distinct and spatially separated 
lobes, the a-helical lobe (AHL, residues 594-758) and the mART core 
(residues 759-907). The mART core interacts strongly with the PDE 
domain and is composed mostly of 3-strands with a couple of a-heli- 
ces. Unexpectedly, in our crystal structure, the AHL has no physical 
proximity to the mART core; this contrasts with the structures of other 
bacterial ADP-ribosylating enzymes, in which the AHL is an integral 
part of the mART domain and contributes to NAD* binding and ADP- 
ribosylation of the substrate®. The solution structure of SdeA>13_907 that 
was determined using small-angle X-ray scattering (SAXS) revealed a 
similar orientation of AHL in solution as seen in the crystal (Extended 
Data Fig. 1c, Supplementary Table 2, Supplementary Information). 
Superimposition of the AHLs of SdeA and Vis toxin, a bacterial ADP- 
ribosyl transferase from Vibrio splendidus (PDB: 4Y1W), revealed a 
proximal conformation of the AHL that differed substantially from that 
seen in the crystal structure (Fig. 1b). We hypothesize that the AHL 
of SdeA213-997 could transiently adopt a conformation proximal to the 
mART core for NAD* binding and processing (Fig. 1b). Consistent 
with this hypothesis, deletion of the AHL (residues 599-758) led to a 
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fragmentation, creating—besides the intact precursor—marker ions 

for the tryptic peptides of the active site of the SdeA PDE domain and 
PR-Ub. This experiment was repeated independently twice with similar 
results. d, Proposed catalytic cycle in SdeA PDE domain mediated by 
E340, H277 and H407. Electron transfer is indicated by curved arrows. A 
detailed description of the mechanism can be found in the Supplementary 
Information. For gel source data, see Supplementary Fig. 1. 


complete loss of ADP-ribosylation of ubiquitin and e-NAD* hydrol- 
ysis’ (Fig. 1c, Extended Data Fig. 2a). Mutation of residues in the two 
flexible loops flanking the AHL affected substrate ubiquitination in 
SdeA 13-997 but not in SdeAp,, suggesting that the dynamic conforma- 
tional shift of AHL occurs only in the context of SdeA213_907, while the 
position of AHL in SdeAg, is fixed in the proximal, active form by the 
C-terminal region (CTR, residues 909-1499) (Extended Data Fig. 2b, 
c). Accordingly, SdeAr, exhibited much greater NAD* sensitivity in 
our in vitro ubiquitination experiments, resulting in complete modi- 
fication of 10 .M ubiquitin with 20 1M NAD*, whereas the activity of 
SdeA213-907 gradually increased with increasing NAD* concentration 
(Fig. 1d). Similarly, SdeAgy exhibited a marked increase in activity com- 
pared to SdeA213-997 with respect to the e-NAD* hydrolysis kinetics 
measured in vitro (Fig. le). SdeA>13-997 did not detectably ubiquitinate 
Rab33b in HEK293T cells, perhaps owing to insufficient cellular NAD* 
concentration (Extended Data Fig. 2d). Moreover, limited proteolysis 
experiments with SdeA constructs containing different C-terminal 
extensions revealed that the construct ending at residue 1233 is indi- 
gestible, whereas shorter constructs collapse to SdeA213-997, indicating 
that the CTR induces a compact or closed state of the SdeA structure 
(Extended Data Fig. 2e). Mixing purified CTR (residues 909-1499) 
or shorter CTR (residues 909-1233) with SdeA13_997 increased the 
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catalytic activity of SdeAz13997 to the same level as SdeAgy (Fig. le). 
These results are consistent with the crystal structure of a longer con- 
struct of SdeA in which CTR stabilizes the proximal orientation of 
AHL}*. Detailed analysis of the structure of the mART domain and its 
interaction with the PDE domain is presented in the Supplementary 
Information (Extended Data Figs. 3, 4). 

The PDE domain of SdeA hydrolyses ADP-ribosylated ubiquitin 
(ADPR-Ub) and catalyses the transfer of ubiquitin to serine residues of 
the substrate protein via a phosphoribose linker’. The catalytic pocket 
of PDE in SdeA is lined by several conserved histidines (Fig. 2a) and 
mutation of H277 or H407 has been shown to abolish the activity of 
the PDE domain’. We hypothesized that the PR-ubiquitination reac- 
tion by SdeA might take place via a transient intermediate involving 
covalent attachment of phosphate to a catalytic histidine residue!”. 
Phosphohistidine intermediates are difficult to observe owing to their 
extreme lability!» '2. Therefore, we introduced a few key mutations 
into the catalytic pocket of the PDE domain to stabilize a potential 
intermediate. Unexpectedly, we observed a SdeA-Ub intermediate 
that was sensitive to heat treatment only in the H407N mutant (Fig. 
2b, Extended Data Fig. 5a). Using mass-spectrometry analysis of the 
tryptic digest of the intermediate reaction, we could identify an ion with 
the exact mass of ubiquitin (Ub) 34-48 bridged to SdeA 275-284 by 
phosphoribose. We applied low-energy HCD (high-energy collisional 
dissociation) fragmentation to specifically cleave the phosphoramidate 
bond, and confirmed that the generated fragment ion corresponded 
to SdeA 275-284 and phosphoribosylated Ub 34-48 (Fig. 2c). This 
analysis revealed that H277 of SdeA is linked by phosphoribose to Ub 
through a phosphoramidate bond. We further validated the identity 
of the histidine-bridged intermediate by high-energy fragmentation 
generated ion series of the peptide backbones (Extended Data Fig. 5b). 
Using rhodamine-labelled and haemagglutinin-labelled ubiquitin, we 
observed that both ubiquitin variants attached to SdeA213-997(H407N) 
in a heat-dependent manner (Extended Data Fig. 5c, d). Upon heating, 
the levels of intermediate decreased with a concomitant increase of 
PR-Ub, indicating that PR-Ub is attached to a catalytic histidine of 
SdeA (Extended Data Fig. 5e). Notably, residue E340 of SdeA forms a 
hydrogen bond with H277, potentially activating this histidine to be a 
strong nucleophile? (Fig. 2a). Both double mutants (H277A/H407N 
and E340A/H407N) failed to form the intermediate, supporting our 
notion that H277 is the intermediate-forming residue and that E340 
has a critical role in the activation of H277 (Extended Data Fig. 5f). 
Based on the observed stabilization of the intermediate in H407 
mutants, we propose a general role for H407 in orienting and activat- 
ing a water molecule or the substrate serine for a nucleophilic attack 
on H277-PR-Ub (Fig. 2d). Accordingly, SdeA213-997(H407N) is defec- 
tive in substrate ubiquitination and only partially active in producing 
PR-Ub (Extended Data Fig. 6a). We propose a two-step phosphoryl 
transfer reaction scheme for the PDE-mediated PR-ubiquitination of 
substrates (Fig. 2d). This model is also supported by the crystal struc- 
ture of ADPR-Ub in complex with the PDE domain of SdeD™. In 
addition to revealing critical roles for the two histidine residues, the 
structure of PDE also showed that side chains of Y347 and R413 are 
inserted into the catalytic centre and could be potentially involved in 
either the phosphoryl transfer activity or the binding of ADPR-Ub 
to SdeA (Fig. 2a). Mutating these residues into alanine inhibited 
PR-ubiquitination by SdeA (Extended Data Fig. 6b, c). 

To gain insights into substrate recognition by SdeA, we set out to 
identify ubiquitination sites within the recently described SdeA sub- 
strate RTN4B using a mass-spectrometry-based approach?. We found 
two ubiquitination sites in the cytoplasmic part of RTN4B, where each 
site contained two serine residues that are targeted by SdeA for ubiq- 
uitination (Extended Data Fig. 7a, b). RTN4B peptides of about 13 
residues containing the target serine residues served as substrates for 
SdeA (Fig. 3a). Alignment of all peptide sequences containing known 
SdeA target serines using Seq2logo server’? generated a sequence motif 
(Fig. 3b, Extended Data Fig. 7c), in which the target serine is in the 
vicinity of hydrophobic residues and is flanked by proline residues. 
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Furthermore, we produced 47 degenerate peptides based on one of 
the RTN4B ubiquitination sites and performed ubiquitination assays 
with all of them (Supplementary Table 3, Supplementary Information, 
Fig. 3c). Individual peptide sequences and their fractional activity were 
given as inputs for NNalign server to generate a sequence motif’. The 
resulting sequence motif confirms the importance of hydrophobic 
residues surrounding the target serine sites for SdeA ubiquitination 
(Fig. 3d). Notably, we have identified an AMP analogue, adenosine 
5'-O-thiomonophosphate (AMPS), that acts as a low-affinity inhibitor 
of substrate ubiquitination by the SdeA PDE domain by affecting sub- 
strate positioning (Supplementary Information, Extended Data Fig. 8). 
AMPS also inhibited the ubiquitination activity of SdeC, a paralogue 
of SdeA (Extended Data Fig. 8g), indicating the potential implications 
of this lead compound in developing novel inhibitors against this class 
of Legionella toxins. 

Further inspection of the SdeA PDE domain structure revealed the 
existence of a cleft on the surface that leads to the active site of the 
PDE containing the catalytic residue H277 (Fig. 4a). We hypothesized 
that this cleft could bind and position polypeptides containing sub- 
strate serines. We identified several SdeA cleft mutants (Fig. 4b) that 
are associated with a substantial decrease in substrate ubiquitination, 
while showing negligible effect on phosphoribosylation of ubiquitin 
(Fig. 4c, Extended Data Fig. 9a). Among the mutants, M408A, L411A 
and M408A/L411A had the biggest effect on Rab33b ubiquitination in 
both SdeA213-997 and SdeApy. These mutations also affected ubiquiti- 
nation of RTN4B, indicating that SdeA recognizes multiple structurally 
different substrates via this region (Fig. 4d). 

The identification of substrate-binding mutants of SdeA enabled 
us to investigate which of its two functions (phosphoribosylation of 
ubiquitin or substrate ubiquitination’) is physiologically relevant. 
The L. pneumophila AsidEs mutant? was complemented with either 
wild-type SdeAgy or various substrate-binding mutants (Extended 
Data Fig. 9b). Both the M408A and M408A/L411A mutants lacked 
the ability to complement the growth of the bacterial strain lacking 
the sidE effector family in the amoeba host Dictyostelium discoideum 
(Fig. 4e, Extended Data Fig. 9c) and also failed to restore the AsidEs 
mutant in the recruitment of RTN4 to the Legionella-containing vacu- 
ole during infection in primary mouse macrophages (Fig. 4f, Extended 
Data Fig. 9d). Together, our results indicate that targeting of specific 
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Fig. 3 | Substrate recognition by SdeA. a, In vitro ubiquitination 

of RTN4B peptides (Pept1 and Pept2) and Rab33b by SdeAgy. b, 
Alignment of target serine sequences of SdeA identified so far. c, In vitro 
ubiquitination of 47 degenerate peptides designed with RTN4B substrate 
peptide as template. d, Sequence motif generated by NNalign software, 
resulting from analysis of in vitro ubiquitination data of the peptides. 
Experiments were repeated independently twice with similar results (a, c). 
For gel source data, see Supplementary Fig. 1. 
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Fig. 4 | Substrate-binding site in SdeA PDE domain. a, SdeA PDE 
domain in surface representation with the catalytic site coloured 

orange. b, Amino acid residues in the PDE active site (orange) and 

the putative substrate-binding cleft (magenta). c, In vitro Rab33b 
ubiquitination assays with SdeA>)3_997 substrate-binding mutants. d, In 
vitro RTN4B ubiquitination assays with SdeA2 13-997 substrate-binding 
mutants. e, Fold change in colony forming units (CFU) in wild-type 

L. pneumophila and the AsidEs strain complemented with mutants 
defective in substrate ubiquitination. SdeA catalytic dead mutant EE/AA 
(E860/E862A) was used as a control (n =3 biological replicates). Exact 


substrates for ubiquitination rather than the modification of ubiquitin 
is the central function of SdeA in acute bacterial pathogenicity. 

The SdeA structure and the detailed biochemistry presented 
here give us a first-hand glimpse into the atomic details of the 
PR-ubiquitination catalysed by the SidE family of bacterial enzymes 
and enable us to pin down substrate ubiquitination as the pathogenic 
principle of SdeA. PR-ubiquitination by the PDE domain progresses 
through a transient intermediate in the form of SdeA H277-PR-Ub, 
which is subsequently attacked by the OH group of the target serine of 
the substrate for successful ubiquitin transfer. This suggests a double 
displacement mechanism for PDE catalysis in which the binding of 
PR-ubiquitination substrates at the PDE active site may first require 
release of AMP generated during the intermediate formation. This is 
consistent with the juxtaposed catalytic groove of PDE domain and the 
substrate binding cleft. Notably, the active sites of mART and PDE face 
opposite sides of the molecule (Fig. la, b), hinting that there may not 
bea direct transfer of ADP-ribosylated ubiquitin between the catalytic 
centres of the mART and PDE domains. However, potential dimeriza- 
tion of SdeA, as observed with purified proteins in solution (Extended 
Data Fig. 10), may enable these two catalytic sites to face each other 
in trans. Sequence analysis of SdeA substrates revealed that the tar- 
get serine residues could occur in disordered regions in line with the 
limiting size of the substrate-binding cleft in the SdeA PDE domain 
(Fig. 4b). On the basis of the substrate recognition motif identified in 
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P values: AsidEs versus pSdeA (SdeA plasmid) = 0.024; pSdeA versus 
pSdeA EE/AA = 0.032; pSdeA versus pSdeA M408A = 0.028; pSdeA 
versus pSdeA M408A/L411A = 0.023, as analysed by two-tailed t-test. f, 
Percentage of RTN4 positive vacuoles containing relevant L. pneumophila 
strains (n = 3 biological replicates). Exact P values: AsidEs versus 

pSdeA = 0.0015; pSdeA versus pSdeA EE/AA = 0.0015; pSdeA versus 
pSdeA M408A = 0.0034; pSdeA versus pSdeA M408A/L411A = 0.0051, as 
analysed by two-tailed t-test. Data shown as mean + s.e.m. *P < 0.05, **P 
< 0.01 (e, f). Experiments shown in c and d were repeated independently 
twice with similar results. For gel source data, see Supplementary Fig. 1. 


this study, we propose that SdeA could be a broad-specificity ligase that 
targets disordered serine residues in multiple substrates. Therefore, the 
specificity of SdeA-mediated ubiquitination during Legionella infection 
could be conferred by its recruitment to the endoplasmic reticulum, 
where currently identified in vivo substrates of SdeA reside”. The 
dissection of PR-ubiquitination catalysis by SdeA presented here may 
also aid the future discovery of related mammalian enzymes. 
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Any Methods, including any statements of data availability and Nature Research 
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METHODS 


Expression and purification. Expression and purification of SdeA and Rab33b 
have previously been described’. In brief, T7 express cells were transformed with 
wild-type SdeA (UniProt accession code: Q5ZTK4) or mutant constructs cloned 
in a modified pET21a vector with a C-terminal CPD (cysteine protease domain)- 
His tag. Rab33b was cloned into a pET21a vector with a C-terminal His tag. For 
selenomethionine-labelled SdeA213_997, the plasmid was transformed into B834 
competent cells and was expressed using selenomethionine-containing minimal 
medium (Molecular Dimensions). T7 express transformed cells were grown in LB 
medium at 37°C to an optical density (OD) of 0.6-0.8, induced with 0.5 mM IPTG 
(isopropyl 3-p-1-thiogalactopyranoside), grown overnight at 18 °C and collected. 
The cell pellet was resuspended in a buffer containing 50 mM Tris-HCl pH 7.5, 
300 mM NaCl, 10% (v/v) glycerol, 1 mM PMSF, DNAse and protease inhibitor 
cocktail tablets (Roche). The cells were lysed by sonication. Clarified superna- 
tant (12,000 r.p.m., 4 °C, 40 min) was incubated for 1 h with pre-equilibriated 
Talon beads before being washed three times with 50 mM Tris-HCl pH 7.5, 300 
mM NaCl, 10% (w/v) glycerol. For RTN4B purification, the clarified supernatant 
was spun again (40,000 r.p.m., 4 °C, 90 min) to collect membrane fractions. The 
membranes were homogenized and solubilized in the presence of 1% n-dodecyl 
8-p-maltoside (DDM). For the rest of the RTN4B purification, 0.05% (w/v) DDM 
was maintained in the buffer. For Rab33b purification, the protein was eluted using 
200 mM imidazole after binding to Talon beads. The CPD-His tag of SdeA and 
RTN4B was cleaved off the protein while it was still bound to Talon beads using 
100 1M phytic acid. For biochemical assays, the buffer was exchanged to a final 
buffer of 10 mM HEPES pH 7.5, 150 mM NaCl and 1 mM TCEP. For crystalliza- 
tion, SdeA2)3-997 was loaded onto a Q-sepharose column after CPD tag cleavage. 
SdeA213-997 eluted in flowthrough, while most of the impurities were bound to the 
column. The fraction containing the protein was then concentrated before injecting 
into a Superdex 75 16/60 size-exclusion chromatography column pre-equilibrated 
with 10 mM HEPES pH 7.5, 150 mM NaCl, 1 mM TCEP. The protein eluted in 
a single peak and fractions were pooled together and concentrated to 25 mg/ml 
before setting up crystallization screens. 

Limited proteolysis. One milligram of SdeA 93-993 was incubated with 50 jug GluC 
in 20 mM HEPES pH 7.5, 50 mM NaCl and 10 mM MgSO, for 1 h on ice. This 
was followed by size-exclusion chromatography of the reaction on a Superdex 
75 16/60 column equilibrated with 10 mM HEPES pH 7.5, 150 mM NaCl. The 
fractions were pooled and protein matrix assisted laser desorption/ionization 
(MALDI) mass spectrometry was performed. The protein fragment of interest 
was then identified based on the exact mass using the Expasy server tool Findpept. 
Various constructs of SdeA were also similarly proteolysed by GluC and analysed 
by Coomassie stained SDS gel. 

Crystallization. The purified protein was clarified by centrifugation (10,000 r.p.m., 
10 min, 4 °C) before we set up crystallization plates. Sitting drop and hanging drop 
sparse matrix screens in 96-well format were set up with 125 nl protein and 125 
nl precipitant solution. The protein crystallized in 100 mM Bis Tris propane pH 
7.0-8.0, 0.1-0.2 M sodium citrate tribasic dihydrate, 20-30% (w/v) PEG 3350 at 
20 °C. The morphology of the thin plate-like crystals was improved using 0.5-1% 
(v/v) ethylene glycol and 50-200 mM non-detergent sulfobetaine-201 (NDSB-201) 
as additives. The crystals were cryoprotected using mother liquor supplemented 
with 15% (v/v) glycerol and 10% (v/v) ethylene glycol and flash frozen in liquid 
nitrogen. SeMet-labelled SdeA crystallized in similar conditions as wild-type SdeA. 
Data collection, data processing and structure solution. Both SeMet and native 
diffraction data were collected at the PXIII beamline of the Swiss Light Source, 
Villigen. Native data were collected at wavelength 1.00003 A. The data were pro- 
cessed using XDS$!”. SAD (single anomalous dispersion) data from SeMet crystals 
were collected at wavelength 0.97927 A and processed using XDS, and the phases 
were calculated using Phenix autosol'*. Using these initial phases, buccaneer was 
used to build most of the PDE domain and segments of mART and AHL that 
were well-ordered!’. After manual correction of the output model of buccaneer, 
MR-SAD (molecular replacement-single-wavelength anomalous dispersion) was 
performed using PHENIX autosol to further improve the phases. Further building 
was done manually in Coot using SeMet positions as a guide”. Using the model 
obtained from experimental phases, molecular replacement was performed for the 
native data of SdeA extending to 2.8 A. Further iterative cycles of manual building 
and refinement were performed using coot and refinement programs phenix refine 
and Buster (Global phasing)'®* 21 Ramachandran statistics for the refined SdeA 
core structure are favoured: 93.6%, allowed: 6.4%, outliers: none. 

In vitro ubiquitination assays. SdeA ubiquitination experiments were done as 
described’. In brief, 2.5 xg purified untagged ubiquitin and 2 jg Rab33b were 
incubated with 1 jug SdeA (FL, 213-907 wild-type and variants and SdeA PDE 
domain) at 37 °C for 30 min in the presence or absence of 200 11M NAD* ina 
buffer containing 50 mM Tris-HCl pH 7.5, 50 mM NaCl in a final reaction vol- 
ume of 30 1. For reactions involving the SdeA PDE domain, purified ADP ribo- 
sylated ubiquitin (2.5 1g) was used instead of NAD*. ADP ribosylated ubiquitin 


was generated using the SdeA H277A mutant and purified using size-exclusion 
chromatography. The reaction mixture was subjected to SDS-PAGE followed by 
Coomassie staining. Alternatively, the reaction mixture was subjected to SDS- 
PAGE followed by western blotting using ubiquitin antibody and pan-ADP ribose 
antibody (Millipore) or to phosphostaining to identify PR-UB (Pro-Q diamond 
phosphostaining protocol, Thermo Fisher). For ubiquitination assays using FL 
GFP-tagged SdeA constructs, we transfected 2 1g GFP-tagged SdeA constructs 
into HEK293T cells cultured in 6-well plates. HEK293T cells were obtained from 
ATCC (ATCC CRL-3216) and authenticated using STR DNA profiling. All the cell 
lines used tested negative for mycoplasma. Cells were collected after 24 h and lysed 
in 150 jl lysis buffer (50 mM Tris-HCl pH 7.4, 150 mM NaCl, 1% Triton X-100, 
10% glycerol, protease inhibitors cocktail and 1 mM PMSF). Using 15 jul GFP-trap 
beads, we purified the GFP-tagged SdeA proteins from the clarified lysate. After 
extensive washing of the beads, an in vitro ubiquitination reaction was set up in 
30 il reaction buffer (50 mM Tris-HCl, 50 mM NaCl, pH 7.4) with 3 jg purified 
Rab33b, 2.5 jg ubiquitin and 0.2 mM NAD‘. The reaction was performed at 37 °C 
in a thermomixer while shaking. After 30 min, the reaction was stopped by adding 
SDS loading buffer and the samples were analysed using Coomassie staining of 
SDS gel and western blotting. The reaction mixture was analysed similarly to the 
assays using bacterially purified SdeA2)3_997 except for the use of complementary 
ubiquitin antibodies (CS-Ub and abcam-Ub) to monitor ubiquitin modification’. 
Where indicated, ADPR-Ub was used as co-factor in reactions instead of Ub and 
NAD. Peptide ubiquitinations were carried out with 0.5 mM of each peptide as 
substrate with SdeAg,. All experiments were repeated at least twice. 

e-NAD* hydrolysis. For measuring the ubiquitin ADP-ribosylation kinetics of 
SdeA.;3-997 and its mutants, we used an e-NAD hydrolysis assay’. Five micrograms 
of wild-type or mutant SdeA was incubated in a buffer containing 50 mM Tris pH 
7.5, 50 mM NaCl with 100 jg ubiquitin in 100 jl reaction buffer. e- NAD was added 
to final concentration of 1 mM to start the reaction. Fluorescence of c-adenine 
(excitation wavelength: 300 nm, emission wavelength: 410 nm) was monitored 
using a plate reader at 25°C at every 1-min interval. Constructs of the C-terminal 
region (909-1499 and 909-1233) were added in 1.5 molar excess of SdeA213-997 to 
test their effect on activity. All experiments were repeated at least twice. 
Histidine intermediate analysis. This analysis was performed as described pre- 
viously with slight modifications". In brief, 4 1g wild-type or mutant SdeA213_907 
was incubated with 10 jig ubiquitin or labelled ubiquitin and 2 mM NAD* for 5 
min at 37°C. The reaction was stopped by transferring the contents onto ice and 
adding 5x SDS loading buffer (pH 8.8). Gel electrophoresis and transfer were 
conducted at 4°C and analysed by Coomassie staining, fluorescence scanning or 
western blotting. All experiments were repeated at least twice. 

Small-angle X-ray scattering. SdeA213-997 was purified in 10 mM HEPES pH 7.5, 
150 mM NaCl, 1 mM TCEP by size-exclusion chromatography (Superdex 200 
increase). Each fraction was collected and concentrated with centrifugal concentra- 
tion devices (50-kDa cut-off, Supplementary Table 2, Supplementary Information). 
The scattering profile of flow-through buffer from the size-exclusion column was 
recorded as reference buffer scattering. SAXS data were collected at beamline P12, 
EMBL-DESY (Supplementary Table 2, Supplementary Information). Primary data 
analysis was performed using PRIMUS from the ATSAS package”. Bead modelling 
was conducted with DAMMIF”™ and DAMMIN™ from the ATSAS package. SAXS 
curves from atomic model were generated and fitted to experimental data using 
CRYSOL” and SUPCOMB” from ATSAS 2.8.3 package. 

Mass spectrometry. For analysis of the histidine intermediate, an in vitro ubiq- 
uitination reaction of SdeA213-997(H407N) was stopped after 5 min by denatura- 
tion on ice in 5.3 M urea pH 8.8 for 10 min. The sample was diluted to 2 M urea 
with 50 mM ABC pH 8.8 and was loaded onto a 30-kDa filter (Amicon Ultra, 0.5 
ml, Merck). (Ub-)SdeA was trypsinized according to an adapted FASP-protocol 
as previously described’. In brief, the proteins were washed four times with 100 
jl ABC and after adding Trypsin Gold (Promega) in an enzyme:protein ratio of 
1:2, tryptic digestion was performed for 20 min at 22°C. Tryptic peptides in 50 
mM ammonium bicarbonate pH 8.8 were loaded onto a 15-cm self-packed C18 
column and separated with a short gradient (12 min from 10-38% buffer B (80% 
acetonitrile, 0.1% formic acid)) on an easy nLC2 system and injected online into 
a Q Exactive HF mass-spectrometer. Targeted MS2 scans were used to specifically 
fragment the bridged intermediate with different collision energies. For partial 
and specific fragmentation of the phosphoramidate bond, a normalized collision 
energy (NCE) of 15 was applied, and for fragmentation of the peptide backbone 
NCE 30 was used. Spectra were annotated manually and StavroX 3.6 was used 
for additional identification of the peptide backbone fragments”. Identification 
of phosphoribose-bridged ubiquitination sites in RTN4B was done as described 
before by HCD and targeted ETD fragmentation after a modified FASP digest. 
L. pneumophila strains and host infections. L. pneumophila strains used in this 
study were derivatives of the Philadelphia 1 strain Lp02! and were grown and 
maintained on CYE (charcoal-yeast extract) plates or in N-(2-acetamido)-2- 
aminoethane (ACES) buffered yeast extract (AYE) broth as previously described”*. 
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The sidE family in-frame deletion strain and complementation strains have been 
described previously”. sdeAM408A and sdeAM408A_L411A mutant genes were 
cloned into pZL507? for complementation. Raw264.7 cells were cultured in RPMI 
1640 medium supplemented with 10% FBS. D. discoideum AX4 cells were cul- 
tured in HL-5 medium and maintained in MB medium for infection as described’. 
Infection experiments were performed in triplicate. 

Infection. L. pneumophila strains were grown to the post-exponential phase 
(OD¢00 = 3.0-3.6) in AYE broth. Complementation strains were induced with 0.2 
mM IPTG for 4 h at 37°C before infection. Raw 264.7 cells were infected with L. 
pneumophila strains at an MOI of 10 for 2 h to detect the translocation of SdeA 
and its mutants. Raw cells were collected and lysed with 0.2% saponin on ice for 30 
min. Cleared cell lysates were resolved by SDS-PAGE, followed by immunoblotting 
with antibodies specific for SdeA and tubulin. Total L. pneumophila proteins were 
resolved by SDS-PAGE to evaluate the expression of SdeA by immunoblotting with 
SdeA-specific antibodies, and isocitrate dehydrogenase (ICDH) was probed as a 
loading control with previously described antibodies 7’. For intracellular growth 
in D. discoideum, infection was performed at an MOI of 0.1 and the total bacterial 
counts were determined at 24-h intervals as described*’. The enrichment of RTN4 
by bacterial phagosomes was assessed by immunostaining in primary mouse mac- 
rophages infected with the relevant L. pneumophila strains for 2 h at an MOI of 
1.0. Primary mouse macrophages were obtained from A/J mouse (female, 6 weeks, 
Jackson lab cat#000646). No randomization of mice and blinding was necessary as 
mice were used only to collect primary bone marrow macrophages for Legionella 
infection experiments. Immunostaining with anti-RTN4 (Lsbio cat#LS-B6516-50) 
(1:500) was performed as described’. Infection experiments were performed in 
triplicate. 

Antibodies and immunoblotting. For immunoblotting, samples resolved by SDS- 
PAGE were transferred onto 0.2-{1m nitrocellulose membranes (Pall Life Sciences 
cat#66485). Membranes were blocked with 5% non-fat milk and incubated with 
the appropriate primary antibodies: anti-SdeA2, 1:10,000, anti-ICDH3, 1:10,000, 
anti-tubulin (DSHB, E7) 1:10,000. Membranes were then incubated with an appro- 
priate IRDye infrared secondary antibody (dilution: 1:20,000) and scanned using 
an Odyssey infrared imaging system (Li-Cor’s Biosciences). 

Ethical compliance. Animal protocols used in the study were approved by Purdue 
Animal Care and Use Committee. We complied with all the relevant ethical regu- 
lations. No statistical methods were used to predetermine sample size. The exper- 
iments were not randomized and the investigators were not blinded to allocation 
during experiments and outcome assessment. 
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Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. Structure coordinates are available from the Protein Data Bank 
under accession code 6GOC. Small-angle X-ray scattering data and models are 
available from SASBDB (https://www.sasbdb.org/) under accession number 
SASDD65. Full gel source data can be found in Supplementary Fig. 1. The data 
that support the findings of this study are available from the corresponding author 
upon request. 
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Extended Data Fig. 1 | Catalytic core—SdeA213-907. a, Limited proteolysis 
of SdeA fragment 193-998 and subsequent analysis of the fragments by 
Coomassie-stained SDS gel and total mass analysis by mass spectrometry. 
b, In vitro ubiquitination of Rab33b by SdeApy and SdeA213_997. ¢, Left, 


model from DAMMIN superimposed with crystal structure and shown 

in two orientations. Bottom right, pair distance distribution plot and 
DAMMIN model fitting results. Experiments were repeated independently 
twice with similar results (a, b). For gel source data, see Supplementary 


scattering profile of SdeA213-997 with calculated scattering curve from 


Fig. 1. 
crystal structure. Gunier region is shown in inset. Top right, ab initio bead 
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Extended Data Fig. 2 | Role of AHL in SdeA. a, c-NAD* hydrolysis 
assay in the presence of SdeA (SdeA213_997 or SdeAz13-907AaHL) and 
ubiquitin (wild-type or R42A_R72A). b, c, In vitro ubiquitination assay 
with mutations in loops connecting AHL to PDE and the mART catalytic 
core in SdeA213_997 (b) and in SdeApy (c). d, Substrate ubiquitination and 
ubiquitin modification by SdeApy and SdeA2)3-997 in HEK293T cells. 
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Abcam Ub and Cell Signalling Ub antibodies were used to monitor the 
levels of unmodified ubiquitin and total ubiquitin, respectively. e, Limited 
proteolysis analysis of various SdeA constructs. All experiments were 
repeated independently twice with similar results. For gel source data, see 
Supplementary Fig. 1. 
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Extended Data Fig. 3 | Characterization of the mART domain. 

a, Superimposition of the mART core of SdeA with that of NAD* bound 
structure of iota toxin (PDB: 4HOY) from Clostridium perfringens, which 
ADP-ribosylates actin of host cells. Residues in SdeA that are predicted 

to be important for NAD* binding and hydrolysis are labelled. b, In vitro 
ubiquitination assay with NAD* binding site mutants in the mART core 
of SdeA»13_907. ¢, Residues at the interface between the mART core and 
AHL in proximal conformation (AHL p;ox). d, In vitro ubiquitination assays 


Time (seconds) |o--  -|s~ 


with the mutants of SdeA 13-997 MART core-AHLprox interface residues 
indicated in c. e, Comparison of ¢-NAD* hydrolysis by SdeAz13-907 and 
NADt binding site mutants and mutants disrupting the predicted mART 
core-AHLpyox interaction. f, In vitro ubiquitination assays with the 
mutants of SdeAp, MART core~AHLprox interface residues indicated in c. 
Experiments were repeated independently twice with similar results. For 
gel source data, see Supplementary Fig. 1. 
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Extended Data Fig. 4 | Interaction between PDE and mART core. 

a, Details of interaction between SdeA PDE and mART core. Important 
residues mediating the interaction are indicated in the insets. 

b, c, Testing in vitro substrate ubiquitination and ubiquitin modification 
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(c). Experiments were repeated twice independently with similar results. 
For gel source data, see Supplementary Fig. 1. 
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ubiquitination by SdeA PDE mutants. b, I 


with PDE catalytic site mutants. c, In vitro Rab33b ubiquitination assays 
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with GFP-SdeAp, PDE catalytic site mutants purified from HEK293T 
cells. These experiments were repeated independently twice with similar 
results. For gel source data, see Supplementary Fig. 1. 
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Extended Data Fig. 7 | Substrate specificity of SdeA. a, b, Fragmentation § These experiments were done once. c, Sequence motif of target serine 
spectra of the bridged peptide indicating RTN4B ubiquitination sites. sequences of SdeA as computed by Seq2Logo. 
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Extended Data Fig. 8 | Chemical inhibition of SdeA. a, Chemical 
structure of adenosine-5’-thio-monophosphate (5’-AMPS). 

b, 5’-AMPS-mediated inhibition of Rab33b PR-ubiquitination by SdeAp,, 
SdeA 13-597 and SdeA213-907 in the presence of ADPR-Ub. ¢, 5‘-AMPS- 
mediated inhibition of RTN4B PR-ubiquitination by SdeAgr. d, In vitro 
ubiquitination by SdeAgy in the presence of increasing concentrations 


of 5‘-AMPS and AMP. e, Apparent inhibition constants (K;) of AMP 

and 5'-AMPS against SdeApy,, calculated from quantification of substrate 

ubiquitination shown in d. f, PDE domain of SdeC ubiquitinates Rab33b 
and RTN4B. g, Effect of 5’-AMPS on ubiquitination by SdeC PDE. These 
experiments were done twice independently with similar results. For gel 

source data, see Supplementary Fig. 1. 
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Extended Data Fig. 9 | Effect of SdeA substrate-binding mutations 
in vivo. a, In vitro Rab33b ubiquitination assays with GFP-SdeAp, 
substrate-binding mutants purified from HEK293T cells. b, Expression 
and translocation of SdeA using wild-type and various mutant strains 
of Legionella. c, CFU fold change monitored in wild-type Legionella and 
AsidEs strain complemented with substrate ubiquitination defective 
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Extended Data Fig. 10 | Size-exclusion chromatography profile 
of SdeAgy. SdeAry, shows dimeric behaviour in size-exclusion 
chromatography column (Superdex 200 16/60). This experiment was 


repeated twice independently with similar results. For the inset, n = 1. 
MW, molecular weight. For gel source data, see Supplementary Fig. 1. 
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Activity-dependent neuroprotective protein recruits 
HP1 and CHD4 to control lineage- snare larg genes 


Veronika Ostapcuk’™, 2, Fabio Mohn!, Sarah H. Carl!3° nie Basters!°, Daniel Hess!, Vytautas lesmantavicius', 


Lisa Lampersberger!* 


De novo mutations in ADNP, which encodes activity-dependent 
neuroprotective protein (ADNP), have recently been found 
to underlie Helsmoortel-Van der Aa syndrome, a complex 
neurological developmental disorder that also affects several other 
organ functions!. ADNP is a putative transcription factor that is 
essential for embryonic development”. However, its precise roles 
in transcriptional regulation and development are not understood. 
Here we show that ADNP interacts with the chromatin remodeller 
CHD4 and the chromatin architectural protein HP1 to form a 
stable complex, which we refer to as ChAHP. Besides mediating 
complex assembly, ADNP recognizes DNA motifs that specify 
binding of ChAHP to euchromatin. Genetic ablation of ChAHP 
components in mouse embryonic stem cells results in spontaneous 
differentiation concomitant with premature activation of lineage- 
specific genes and in a failure to differentiate towards the neuronal 
lineage. Molecularly, ChAHP-mediated repression is fundamentally 
different from canonical HP1-mediated silencing: HP1 proteins, in 
conjunction with histone H3 lysine 9 trimethylation (H3K9me3), 
are thought to assemble broad heterochromatin domains that are 
refractory to transcription. ChAHP-mediated repression, however, 
acts in a locally restricted manner by establishing inaccessible 
chromatin around its DNA-binding sites and does not depend on 
H3K9me3-modified nucleosomes. Together, our results reveal 
that ADNP, via the recruitment of HP1 and CHD4, regulates the 
expression of genes that are crucial for maintaining distinct cellular 
states and assures accurate cell fate decisions upon external cues. 
Such a general role of ChAHP in governing cell fate plasticity 
may explain why ADNP mutations affect several organs and body 
functions and contribute to cancer progression!**. Notably, we 
found that the integrity of the ChAHP complex is disrupted by 
nonsense mutations identified in patients with Helsmoortel-Van 
der Aa syndrome, and this could be rescued by aminoglycosides 
that suppress translation termination®. Therefore, patients might 
benefit from therapeutic agents that are being developed to promote 
ribosomal read-through of premature stop codons” 

ADNP contains nine N-terminal zinc-fingers and a C-terminal 
homeobox domain, strongly suggesting transcription factor activity®. 
Although originally associated with neuronal function’, ADNP is 
essential for embryonic development in mice: Adnp-deficient mouse 
embryos exhibit neural tube closure defects and die at days 8.5-9.5 of 
gestation. Two studies in knockout mouse embryos identified potential 
ADMP target genes that are implicated in cell differentiation and the 
maintenance of stem cells™!”. 

To dissect the molecular activity of ADNP, we exploited mouse 
embryonic stem (ES) cells'!. We first inserted a Flag- AviTag at the 
endogenous Adnp gene’* (Extended Data Fig. lac) and performed 
chromatin immunoprecipitation coupled to next-generation sequenc- 
ing (ChIP-seq) to interrogate putative ADNP-DNA interactions 
genome-wide. This revealed 15,026 sites that are significantly enriched 
for ADNP (Fig. la, b and Supplementary Table 1). Notably, most (61%) 


, Matyas Flemr!, Aparna Pandey!, Nicolas H. Thomil, Joerg 'Betschinger! & Marc Bithler!2 


of the peaks were found in introns or proximal of annotated transcrip- 
tion start sites. The remaining peaks were located promoter distal in 
intergenic regions (Extended Data Fig. 1d, e). To analyse the func- 
tion of ADNP, we generated homozygous Adnp knockout mouse ES 
cells (Extended Data Fig. 2a, b). Compared with wild-type ES cells, 
Adnp~~ cells displayed gross morphological changes and appeared 
to differentiate spontaneously as they started spreading out of char- 
acteristically densely packed ES cell colonies (Fig. 1c, d). In addition, 
Adnp~‘~ cells displayed heterogeneous activity of the pluripotency 
associated marker alkaline phosphatase (Fig. 1d). Transcriptome pro- 
filing by RNA sequencing (RNA-seq) revealed that most of the genes 
with altered mRNA levels in Adnp~/~ cells were upregulated (Extended 
Data Fig. 2c). Many genes bound by ADNP and displaying increased 
expression in the absence of ADNP encode known lineage specification 
factors, such as GATA4, GATA6, BMP1 or SOX17 (Supplementary 
Table 2). For example, Gata4 is expressed predominantly in mesoderm- 
and endoderm-derived tissues!3, and forced Gata4 expression in mouse 
ES cells induces differentiation towards extra-embryonic endoderm“. 
Moreover, genes upregulated in Adnp~/~ cells were enriched for Gene 
Ontology terms related to differentiation and development (Extended 
Data Fig. 2d and Supplementary Table 3). We also observed a group 
of genes that were upregulated both in Adnp~’~ cells as well as in 
extra-embryonic endoderm stem-cell lines, which can be differentiated 
from mouse ES cells’ (Extended Data Fig. 2e). To gain further insight 
into the biological role of ADNP, we differentiated wild-type and 
Adnp~‘~ ES cells towards neuronal precursor cells (Fig. 1c, d) using an 
established differentiation protocol'®. Adnp~/~ ES cells formed smaller 
embryoid bodies and showed increased cell death after differentiation 
when compared to wild-type cells (Fig. 1d). Nanog and Oct4 (also 
known as Pou5f1) expression was downregulated in both wild-type and 
Adnp~'~ cells, indicating successful exit from pluripotency (Fig. le). 
However, whereas Adnp*’* cells started expressing neural markers 
such as Pax6 and Ngn2 (also known as Neurog2) over the course of 
differentiation, Adnp~’~ cells failed to induce neural genes (Fig. If). 
Instead, the expression of Gata4 and Sox17 was specifically induced 
in Adnp~‘~ cells (Fig. 1g), indicating misspecification towards the 
endodermal lineage under conditions that normally induce neuronal 
fate. This ES cell phenotype is reminiscent of Adnp~’~ mouse embryos, 
which show a developmental delay, fail to induce Pax6 and suffer from 
defective neural tube closure”. Thus, ADNP is required to restrain the 
expression of lineage-specifying genes in ES cells and for specification 
towards the neuronal lineage upon external differentiation cues. 
These results are consistent with previously reported repressive 
activity of ADNP when artificially targeted to a reporter gene’. 
Furthermore, ADNP was shown to co-immunoprecipitate with the 
SWI/SNF chromatin remodelling complex’ or with proteins of the 
heterochromatin protein 1 (HP1) family!*'®. To unambiguously iden- 
tify ADNP-interacting proteins in mouse ES cells, we subjected ADNP 
tagged endogenously with a Flag-AviTag to tandem-affinity purifica- 
tion coupled to liquid chromatography tandem mass spectrometry 
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Fig. 1 | ADNP binds and represses lineage-specifying genes. a, Heat map 
of ADNP ChIP-seq enrichment across all significant peaks (n = 15,026) in 
the mouse genome. Each row represents a 6-kb window centred on peak 
midpoints, sorted by the ADNP ChIP signal. Input signals for the same 
windows are shown on the right. Average peak intensity of n = 3 biological 
replicates. RPM, reads per million. b, UCSC genome browser shots of 
three endoderm specification factors (Igfbp4, Bmp1 and Gata4). ChIP- 
seq profiles for ADNP and input, and RNA-seq profiles for wild-type 
(Adnp*'*) and ADNP-knockout (Adnp~’~) mouse ES cells. Both ChIP-seq 
and RNA-seq profiles are normalized for library size. The experiment 

was repeated three times. c, Wild-type ES cells differentiate to neural 
progenitors (NP) in response to external cues. Consecutive withdrawal 

of 2i and leukaemia inhibitory factor (LIF) results in the formation of 


(TAP-LC-MS/MS). Besides ADNP, we observed highly notable enrich- 
ment of HP18, HP1y and CHD4, but not SWI/SNF complex subunits. 
These interactions were preserved under 500 mM NaCl, showing that 
ADNP stably interacts with CHD4 and the HP16 and HP17 proteins 
in ES cells (Fig. 2a). To corroborate this, we inserted a Flag- AviTag 
into the endogenous Cbx1, Cbx3 and Cbx5 genes, which encode the 
three mammalian HP1 isoforms HP18, HP1y and HPla, respectively’ 
(Extended Data Fig. 3a). Both ADNP and CHD4 were highly enriched 
in HP16 and HP17 purifications (Fig. 2b and Extended Data Fig. 3). By 
contrast, CHD4 did not co-purify with HP 1a (Extended Data Fig. 3b, e), 
and ADNP was 100-fold and 235-fold less abundant than in HP 18 and 
HP17 purifications, respectively (Extended Data Fig. 3g). 

To verify that ADNP, HP1 and CHD4 form a stable complex via 
direct protein-protein interactions, we set out to reconstitute complex 
formation in vitro with recombinant human ADNP, HP1+4 and CHD4 
(Extended Data Fig. 4). Co-lysis of cells expressing HP1y, ADNP and 
CHD4 resulted in the formation of a trimeric complex, which was pre- 
served after streptavidin affinity purification, anion-exchange chro- 
matography, and size-exclusion chromatography (SEC) (Fig. 2c, d). 
Subsequent experiments with full-length and truncated variants of the 
proteins (Extended Data Fig. 4) revealed that ADNP is at the core of 
the complex and interacts with CHD4 via its N terminus and with 
the chromoshadow domain (CSD) of HP1 via its C-terminal domain 
(Fig. 2e), probably through the PXVXL (in which X denotes any amino 
acid) motif’”. In conclusion, CHD4, ADNP and HP1/- form a stable 
protein complex, which we refer to as ChAHP. 

Next we performed ChIP-seq with endogenously tagged HP 1a, 
HP18 and HP1+ and consulted a published dataset”? for CHD4. 
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cellular aggregates (embryoid bodies, EB), which further differentiate into 
neural progenitors by the addition of retinoic acid (RA) at day 4. d, Phase- 
contrast images (original magnification, x5) of Adnp*/+ and Adnp~/~ 
mouse ES cells stained with alkaline phosphatase when grown in 2i, 
serum and LIF (day 0), and during differentiation towards the neuronal 
lineage (days 2, 4 and 8). The experiment was repeated three times. 

e-g, mRNA expression profiles of genes specifying pluripotent cells 

(e; Nanog and Oct4), or cells of the neural (f; Pax6 and Ngn2) or the 
endodermal (g; Gata4 and Sox17) lineages from two independent 
differentiation experiments (light and dark coloured dots). Values normalized 
to Tbp mRNA are shown relative to the Adnp*'* parental cell line (in 2i/ 
serum/LIF medium) for each replicate. Biological replicates were performed 
using independent mouse ES cell lines for each tagged protein. d, day. 


Corroborating the biochemistry, all of the ADNP peaks (n = 15,026, 
Fig. 1) showed enrichment for CHD4 and HP1(/» (Fig. 2f). Of the 
HP1 isoforms, the average HP1- occupancy was the highest, HP1B 
was moderately enriched, and HP1«a was barely detectable at those 
sites (Fig. 2f and Extended Data Fig. 5a—d). This confirms our 
TAP-LC-MS/MS results and indicates that HP 17 is the dominant 
isoform in ChAHP, whereas HP16 is present in a minor fraction of 
ChAHP complexes or forms sub-stoichiometric heterodimers with 
HP1¥. In line with a partial redundancy of HP16 and HP 14, we 
observed that the average HP18 occupancy on all ChAHP-bound 
sites was greatly increased in the absence of HP14, whereas HP1y 
occupancy remained similar in the absence of HP16 (Extended Data 
Fig. 5c). Thus, HP 1+ is the predominant member of ChAHP in ES cells. 

HPI proteins recognize and bind to methylated H3K9 through the 
chromodomain”!”, indicating that HP1 might target ChAHP to H3K9 
methylated nucleosomes. Consistent with previous immunostaining 
experiments?’, we observed slight ADNP and CHD4 association with 
H3K9me3-marked chromatin. However, most of the highly enriched 
ADNP, and respective ChAHP peaks, were located in euchromatin 
(Fig. 2f). Consistent with repressive activity of ChAHP, histone modifi- 
cations associated with active transcription were also absent (Extended 
Data Fig. 5e, f). In line with an H3K9me3-independent recruitment 
of ChAHP, HP1+ with mutations in the chromodomain that abolish 
H3K9me binding still bound to ChAHP target genes (Extended Data 
Fig. 5g). By contrast, the binding of HP 1, was lost at all ChAHP-bound 
sites in the absence of ADNP, whereas HP1+4 bound to genomic regions 
with H3K9me3-modified nucleosomes remained largely unaffected 
in Adnp~/~ cells (Fig. 3a). These results suggest that ADNP targets 
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Elution volume (ml) of the ChAHP complex. ADNP, CHD4 and 
HP1+ were expressed in Hi5 insect cells. Strep- 
tagged HP1+ (S-HP1,) was pulled down with 
co-purifying ADNP and CHD4, followed by 
separation on size-exclusion chromatography 
(SEC) (Extended Data Fig. 4b, c). The fraction 
containing purified ChAHP was loaded on 
SDS-PAGE (c) and reinjected on SEC (d). 

For gel source data, see Supplementary Fig. 1. 
All experiments were performed at least 


twice. e, Scheme depicting ChAHP subunit 
H3K9me3 


ChAHP peaks (n = 15,026) 
euchromatin 
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interactions (see Extended Data Fig. 4). ADNP 
N-terminal zinc-fingers are necessary for 

the interaction with yet-to-be-determined 
CHD4 residues. The PXVXL motif in ADNP 
mediates the interaction with the CSD of HP1. 
Protein domains as predicted by InterPro. CD, 
chromodomain. f, Heat map of ADNP, CHD4, 
HP1y, HP18, HPla and H3K9me3 ChIP-seq 
enrichment across all euchromatic sites bound 
by ChAHP (top) or heterochromatic sites 
bound by HP14 (bottom). Each row represents 
a 6-kb window centred on the ADNP or HP1y 
peak midpoint, respectively. Rows are sorted 
by ADNP (top) or HP1y (bottom) ChIP 
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the complex to euchromatic sites in a sequence-specific manner. Motif 
analysis of ADNP-bound loci revealed several significant DNA motifs. 
The highest-enriched motif (CGCCCYCTNSTG) was present in 63% 
of peaks (P=1 x 1071038), and several motifs often co-occurred at 
bound genomic loci (Extended Data Fig. 6). To prove that ChAHP is 
indeed recruited via sequence-specific binding of ADNP, we deleted 
the predicted ADNP-binding motif (Amotif) at the endogenous Igfbp4 
locus (Fig. 3b). Validating GCCCCCTGGAG as an ADNP-binding 
site, ADNP enrichment was specifically lost at the Igfbp4 locus in 
Igfop4nr'i/ A™"f cells, whereas another ChAHP target gene (Bmp1) 
remained unaffected (Fig. 3c). Importantly, the binding of CHD4 
and HP 1+ was also depleted at the Igfbp4 but not the Bmp1 locus in 
Igfopanret Amo cells (Fig. 3d, e). Consistent with ChAHP-mediated 
target repression, we observed significantly increased Igfbp4 but not 
Bmp1 mRNA levels in Igfbp40"t#’ 4" cells (Fig. 3f). 
Identification of ChAHP strongly suggests that ADNP 
exerts its repressive function with the help of HP1. Indeed, 


a ee 


2 


Cbx1~/~Cbx3-/~Cbx5/~ triple-knockout cells revealed a distinct 
group of genes that was also upregulated in Adnp~’~ cells. This was not 
evident in Cbx single- or double-knockout cells (Extended Data Fig. 7 
and Supplementary Table 4). This suggests functional replacement by 
HP 1a in the absence of HP16 and HP14, even though it only weakly 
interacts with ADNP and is not highly enriched at ChAHP target genes 
(Extended Data Figs. 3g, 5d). The fact that overall gene expression was 
not greatly affected if at least one HP1 isoform was present provides a 
general indication that HP1 isoforms can act partially redundantly to 
repress target genes in ES cells (Extended Data Fig. 7b). 

The requirement of HP1 for ChAHP-mediated repression prompted 
us to revisit ADNP mutations found in patients with Helsmoortel-Van 
der Aa syndrome. Most are frameshift or nonsense mutations that result 
in C-terminally truncated ADNP that lacks the homeobox domain and 
the HP1 interaction motif! (Extended Data Fig. 1b). This suggests that 
mutant ADNP fails to assemble functional ChAHP and/or to bind its 
target genes. To test this, we introduced a patient-specific nonsense 
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Fig. 3 | DNA sequence specifies ChAHP association with euchromatin. IgfopaAmetv’Amotif cells compared to the parental line (Igfbp4*/+).n=3 


a, Heat map of HP1y ChIP-seq enrichment in euchromatic ChAHP- biological replicates. d, ChIP-qPCR measuring CHD4 enrichments 
bound sites (top) or heterochromatic HP 1-bound sites (bottom) in at Igfop4 and Bmp1 promoters in Igfbp44""/ 4" cells compared to 
Adnp*!* and Adnp~‘~ mouse ES cells. Each row represents a 6-kb window _ the parental line. n= 6 biological replicates. e, ChIP-qPCR measuring 
centred on the respective peak signal. Rows are sorted by mean ChIP HP 1+ enrichments at Igfbp4 and Bmp1 promoters in Igfbp4Amti/ Amon 
enrichment. Average peak intensity of n = 3 biological replicates (that cells compared to the parental line. n = 3 biological replicates. f, qRT- 
is, three independent ES cell lines). b, Schemes depicting the location of PCR measurement of Igfbp4 and Bmp1 mRNA levels in wild-type and 
ADNP-binding motifs in the Igfbp4 (left) and Bmp1 (right) genes and the Igfop4rmeti Amotif cells. n =3 biological replicates. P values in c-f were 
Igfbp4 locus with the motif deletion Igfbp40""/4""! (middle). c, ChIP- calculated using two-tailed unpaired unequal variances t-tests. Centre 
qPCR measuring ADNP enrichments at Igfbp4 and Bmp1 promoters in values denote the mean; error bars denote s.d. 


mutation upstream of the homeobox domain at amino acid position 
718 (Adnp’'°”!8; Extended Data Fig. 8a; corresponds to Tyr719 in 4 ~~, _NRF1 peaks | © _ Adnp** 
human ADNP). ADNP??°7!8 failed to co-purify with HP18 and HP 1, 
whereas the interaction with CHD4 remained preserved (Extended 
Data Fig. 8b). Consistent with the requirement of HP1 for silencing, 
we observed increased expression of ADNP-target genes in cells that 
express ADNP??718 (Extended Data Fig. 8c). The ADNP??©7!8 protein 
still bound its target site at the Igfbp4 locus (Extended Data Fig. 8d), 
indicating that the homeobox domain is dispensable for DNA binding 2 asia cal 

but might assist in target repression. These results demonstrate that — ¢ e 
patients with nonsense mutations in the ADNP gene cannot assemble 1J 

fully functional ChAHP complexes. To test whether this could poten- 1.0 O05 O 05 1.0 
tially be restored pharmacologically, we treated ADNP?'©’!8-expressing Distance from peak centre (kb) 
cells with gentamycin or paromomycin, two aminoglycoside antibiotics 
that promote translational read-through’®. Indeed, gentamycin treat- 2.5. ADNP peaks 
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ment promoted read-through of PTC718 (Extended Data Fig. 8e, f) 3 =e nee 
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clinical utility of aminoglycoside therapy is limited by low efficacy and 
serious toxicities®. Fig. 4 | ChAHP obstructs chromatin accessibility. a, Average accessibility 
Finally, we set out to investigate the molecular function of ChAHP. of loci bound by ated tr sae ore Semin gar NRFI (top) and SOX2 
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heterochromatin assembly, we investigated a possible role of ChAHP measured by ATAC-seq. Profiles represent averaged biological replicates 
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(assay for transposase-accessible chromatin using sequencing)’. Many and Adnp~‘~ (red) mouse ES cells. Profiles represent averaged biological 
transcription factors, such as NRF1*4 


, generate local accessible regions replicates (n= 4). c, Heat map showing ATAC-seq read coverage in a 2-kb 
at their DNA-binding sites (Fig. 4a). Unexpectedly, we did not observe — window around all ADNP peaks normalized by library depth in Adnp*/+ 
such footprints for ADNP. Instead, chromatin across ChAHP-bound (left) and Adnp~’~ (right) ES cells. 
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loci was largely devoid of an ATAC-seq signal (Fig. 4b, c). This sug- 
gests that ChAHP is bound to chromatin with impaired accessibility, 
or conversely, that the binding of ChAHP renders chromatin inacces- 
sible. Notably, all ChAHP-bound sites became readily accessible in the 
absence of ADNP, whereas ChAHP-independent control loci such as 
NRF1- or SOX2-binding sites, as well as a ‘random’ set of genomic loci 
(ADNP peaks shifted by 10 kb) showed no difference in accessibility 
(Fig. 4a—c). Notably, the opening of chromatin in Adnp~/~ cells was 
restricted to a few hundred base pairs around ChAHP-binding sites, 
and the surrounding regions remained inaccessible (Fig. 4c). Thus, 
rather than assembling broad, inaccessible domains of chromatin, 
ChAHP denies direct access to its cognate DNA-binding sites. 

In summary, we have discovered ChAHP, a gene-regulatory complex 
that consists of the chromatin remodeller CHD4, the DNA-binding 
factor ADNP, and the heterochromatin proteins HP16 and HP14. By 
locally restricting access to DNA, ChAHP prevents endodermal gene 
transcription in mouse ES cells and during neuroectodermal differen- 
tiation. This stabilizes cellular states and ensures correct lineage spec- 
ification. Although ChAHP could directly interfere with transcribing 
RNA polymerase, we favour a model in which ChAHP prevents the 
binding of other regulatory factors, such as transcriptional activators, 
to DNA. Although the exact mode of action of ChAHP remains to be 
determined, such a model would be consistent with the notion that 
ChAHP also binds outside gene bodies and promoters. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized, and investigators were not blinded to allocation during 
experiments and outcome assessment. 

Cell culture and genome editing. Mouse ES cells (129 x C57BL/6”° were cultured 
on gelatin-coated dishes in ES medium containing DMEM (Gibco 21969-035), 
supplemented with 15% fetal bovine serum (FBS; Gibco), 1x non-essential amino 
acids (Gibco), 1 mM sodium pyruvate (Gibco), 2mM L-glutamine (Gibco), 0.1 mM 
2-mercaptoethanol (Sigma), 50mg ml“! penicillin, 80 mg ml“! streptomycin, 3 1M 
glycogen synthase kinase (GSK) inhibitor (Calbiochem, D00163483), 101M MEK 
inhibitor (Tocris, PD0325901), and homemade LIE, at 37°C in 5% COp. 

The genome editing was performed as previously published”? in the absence of 

GSK and MEK inhibitors in the above-described ES medium. 
Generation of endogenously tagged ES cell lines. For endogenous gene tagging 
using TALENSs, Rosa26:BirA-V5-expressing cells (CMB053 or cMB063) were trans- 
fected with 400 ng TALEN-EED, 400 ng TALEN-KKR, 100 ng pRRP reporter and 
1,000 ng of donor single-stranded oligodeoxynucleotide encoding the tag sequence. 
The single-stranded oligodeoxynucleotides were synthesized as Ultramers by 
Integrated DNA Technologies and their sequences are listed along with TALEN 
sequences in Supplementary Table 6. All transfections were carried out using 
Lipofectamine 3000 reagent (Invitrogen) at a 3 11:1 1g DNA ratio in OptiMEM 
medium (Invitrogen). Transfected cells were selected by adding puromycin 
(21g ml~') to the ES medium 24h after transfection. After 36h of selection, surviv- 
ing cells were sparsely seeded for clonal expansion. The resulting clones were indi- 
vidually picked, split and screened by western blot for desired tag integration. See 
Supplementary Information for the list of tagged cell lines generated in this study. 
Straight knockout ES cell line generation. Cbx1~/~ mouse ES cells were generated 
using TALENs that target the first and last coding exon, resulting in a deletion of 
approximately 6,000 bp (exon 2-exon 6). 

Adnp~/~ mouse ES cells were generated using Cas9 and TALENs that target the 
first and last coding exon, resulting in a deletion of approximately 7,000 bp (exon 
2-exon 4). The Cas9 sgRNA sequence was cloned into the SpCas9-2A-mCherry 
plasmid?®, Sequences of TALENs and Cas9 gRNA can be found in Supplementary 
Table 6. See Supplementary Information for the list of straight knockout cell lines 
generated in this study. 

Conditional ES cell line generation. The Cbx3!" cell line was generated as 
described!*. For the Cbx5!/ conditional cell line, a mouse ES cell line containing 
an integration of the CreERT2 recombinase fusion in the Rosa26 locus (C¢MB052 
or CMB063) was transfected with TALENs cutting before and after the third 
exon. Single-stranded oligodeoxynucleotides with corresponding homology arms 
and l/oxP sites for integration were also included in the transfection mix (see 
Supplementary Table 6 for sequences). Clones were screened for homozygous 
integrations for both loxP sites. A cell line with both bi-allelic loxP integrations 
was tested for recombination efficiency by treating the cells with 0.1 1M 
4-hydroxytamoxifen (4-OHT; Sigma) followed by western blot or quantitative 
PCR with reverse transcription (qRT-PCR). 

Transient expression experiments in ES cells. The full-length Adnp cDNA was 
cloned into the mammalian expression vector pEFaFB (promoter of elongation 
factor-1 alpha; ATG-3 x Flag-Avi-GOI-2A-puromycin), creating pEFaFB-Adnp, 
which was then used as a template to mutate codon 718 (TAT to TAA) using the 
QuikChange Lightning Site-Directed Mutagenesis (SDM) Kit (Agilent), with the 
final construct encoding pEFaFB-ADNP??7!8, Alternatively, Adnp??©7!8 cDNA 
was cloned into the pEFaCFB vector (C-terminal 3 x Flag-AviTag). 

The Cbx3 cDNA was cloned into pEFaFB, and then used as a template for SDM 
mutating chromodomain residues Trp43 and Phe46 to Ala (Cbx3 CDmut)””. 

For ChIP experiments, 5 x 10° Adnp~/~ (cMB377) cells in a 10-cm dish 

were reverse transfected with 101g pEFaFB-Adnp or pEFaFB-AdnpPTC718. 
Alternatively, Cbx1~/~Cbx3~/~Cbx5~/~ triple-knockout (4-OHT-treated cMB282) 
cells were transfected with 101g pEFaFB-Cbx3 or pEFaFB-Cbx3-CDmut. Cells 
were collected 48h after transfection, and further processed according to the ChIP 
protocol (below). For affinity purification experiments, 6 x 10° cells in a 15-cm 
dish were reverse transfected with 10|1g pEFaFB-AdnpPTC718 or pEFaCFB- 
AdnpPTC718. Twenty hours after transfection, cells were forward-transfected with 
5 wg pEFaFB-AdnpPTC718. Finally, 48h from the first transfection, cells were 
treated with 2mg ml! gentamycin (Sigma, G1914) or paromomycin (Sigma, 
P9297) for 24h. Cells were then collected and processed according to the 
Affinity purification protocol (bellow). Transfections were carried out using 
Lipofectamine 3000 reagent (Invitrogen) at a 3 11:1 j1g DNA ratio in OptiMEM 
medium (Invitrogen), and 101M ROCK inhibitor (Tocris, Y27632) for increased 
cell survival. 
Differentiation of ES cells to neuronal precursors. The cMB263 (Adnp*/*) and 
cMB267 (Adnp~‘~) ES cell lines were differentiated as previously described”®, 
except that no feeder cells were used. Instead, cells were grown in ES cell medium 
containing 2i, as described above. 


Western blotting. Cells were grown to confluency on 6-well plates, collected in 
PBS, pelleted by 2 min centrifugation at 400g, and pellets were then resuspended 
in 100, protein extraction buffer (50 mM Tris-HCl, pH 7.5, 150 mM NaCl, 1% 
Triton X-100, 0.5mM EDTA, and 5% glycerol) supplemented with protease inhib- 
itor cocktail (PIC; Roche), 1 mM PMSF, and 1 mM dithiothreitol (DTT). Proteins 
were extracted for 30 min on ice, the lysates were centrifuged at 16,000g for 20 min 
at 4°C, and the protein concentration in the supernatant was determined using 
the BioRad protein assay. For western blotting, 201g of protein was resolved on 
NuPAGE-Novex Bis-Tris 4-12% gradient gels (Invitrogen), which were semi-dry 
transferred on polyvinylidene fluoride (PVDF) membrane, blocked for 30 min 
in 2.5% non-fat dry milk in TBS plus 0.05% Tween 20 (TBST), and stained with 
primary antibodies at 4°C overnight. The primary antibodies used for west- 
ern blotting were mouse anti-Flag (1:1,000, Sigma clone M2), goat-anti-HPla 
(1:1,000, Abcam, ab77256), mouse-anti- HP 1a (1:1,000, Millipore, mab3446), rat- 
anti-HP1 (1:500, Serotec, MCA1946), mouse-anti-HP17 (1:2,000, Cell Signaling 
Technology), mouse-anti-CHD4 (1:1,000, Abcam, ab70469), rabbit-anti- MTA2 
(1:1,000, Bethyl, A300-395A-T), rabbit-anti-GATAD2B (1:1,000, Bethyl, A301- 
283A-T), rabbit-anti-MBD3 (1:1,000, Bethyl, A302-528A-T) and rat-anti-tubulin 
(1:5,000, Abcam clone YL1/2). Signal was detected with corresponding horse- 
radish peroxidase (HRP)-conjugated secondary antibodies and Immobilon 
Western Chemiluminiscent HRP Substrate (Millipore). For streptavidin staining, 
membranes were blocked after transfer in 2% bovine serum albumin (BSA) in 
TBST and incubated with streptavidin-HRP (1:20,000, Sigma) for 30 min at room 
temperature, followed by signal development as above. 

ChIP. A confluent 10-cm culture dish of ES cells (approximately 2 10’ cells) were 
cross-linked for 7 min at room temperature, with 1% final formaldehyde solution 
(Sigma, F8775) added directly to the ES medium. Cross-linking was quenched 
by the addition of glycine to a final concentration of 0.125 mM and incubation at 
4°C for 10 min; cells were then washed twice with PBS. Cells were collected in 1 ml 
PBS with PIC (Roche) and spun at 600g for 5 min at 4°C. Cells were then resus- 
pended in 5 ml wash solution I (10 mM Tris pH 8, 10mM EDTA, 0.5mM EGTA, 
0.25% Triton X-100), incubated for 10 min on ice, then spun at 1,200g for 5 min at 
4°C. The remaining nuclear pellet was then resuspended in 5 ml wash solution II 
(10mM Tris pH 8, 1 mM EDTA, 0.5mM EGTA and 200 mM NaCl) and incubated 
for 5 min on ice, then spun at 1,200g for 5 min at 4°C. Cell pellet was subsequently 
washed in 90011 sonication buffer (10 mM Tris-HCl pH 8.0, 1mM EDTA and 0.1% 
SDS) without disturbing the pellet, and finally resuspended in the sonication buffer 
supplemented with PIC. Chromatin was then sonicated in Covaris 1-ml tubes for 
15 min with the following settings: duty cycle: 5%, peak incident power: 140 W, 
cycles per burst: 200, temperature (bath): 4°C. 

Beads preparation. For Bio-ChIP (ChIP for proteins tagged with the Flag-Avi 
tag), 40,11 Dynabeads Stepavidin (Thermo Fisher) per sample, or alternatively 40 j1l 
Protein-G Dynabeads (Thermo Fisher) per sample for ChIP with protein-specific 
antibodies (Ab-ChIP), were washed twice for 5 min in 0.5 ml blocking buffer (PBS, 
0.5% Tween and 0.5% BSA). Streptavidin Dynabeads were then washed twice with 
immunoprecipitaiton buffer (50 mM HEPES pH 7.5, 150mM NaCl, 1mM EDTA, 
0.1% SDS, 0.1% sodium deoxycholate and 1% Triton X-100) and stored on ice. 
Protein-G Dynabeads were incubated for 1h at room temperature in blocking 
buffer with the desired antibody. Beads were then washed twice in blocking buffer 
and stored on ice. For CHD4 ChIP, 10}1g mouse anti-CHD4 (Abcam, ab70469, 
3F2/4) conjugated to Protein G was used. 

Immunoprecipitation and washes. For immunoprecipitation analyses, 1011 (1%) 
was kept as the input sample, and 4011 pre-blocked Dynabeads were added to 1 ml 
of sonicated chromatin in immunoprecipitation buffer and incubated overnight at 
4°C ona rotating wheel. Beads were collected on a magnetic rack for 2-3 min to 
remove supernatant between each step, and washed as follows: for Bio-ChIP, twice 
for 10 min with 2% SDS in TE buffer (10 mM Tris, pH 8, 1 mM EDTA), once for 
10 min with high salt buffer (50 mM HEPES pH 7.5, 1mM EDTA, 1% Triton X-100, 
0.1% sodium deoxycholate and 500 mM NaC]), once for 10 min with DOC buffer 
(250mM LiCl, 0.5%, NP-40, 0.5% deoxycholate, 1mM EDTA and 10 mM Tris pH 8) 
and twice for 10 min with 1 ml TE buffer. For Ab-ChIP, beads were washed five 
times with immunoprecipitation buffer, twice with DOC buffer, and twice with TE 
buffer. Beads were then resuspended in 300 1] elution buffer (1% SDS and 100mM 
NaHCOs) and 6,11 RNaseA (10mg ml! stock) and incubated at 37 °C for 30 min 
while mixing. Elution buffer was adjusted with 6 l 0.5 M EDTA, 121 1M Tris 
pH 8 and 2.511 Proteinase K (10 mg ml7|, Roche). Beads were incubated for 3h at 
55°C and then overnight at 65°C with mixing to de-crosslink. The same procedure 
was followed for input samples including RNase and proteinase K digestion. DNA 
was purified using AMPure XP beads (Beckman Coulter). Quantification was 
performed with Qubit dsDNA high-sensitivity assay (Thermo Fisher). 
ChIP-qPCR and ChIP-seq. DNA was subjected to qPCR analysis (as described 
for qRT-PCR, below) using ChIP primers described in Supplementary 
Information. For ChIP-seq sample preparation, library construction was 
performed using the NEBNext Ultra kit (New England Biolabs) following 
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manufacturer recommendations. Libraries were sequenced on Illumina HiSeq 
2500 machines, with 50-bp single-end sequencing. 

qRT-PCR and RNA-seq. For qRT-PCR experiments, total RNA was extracted 
from ES cells with the Absolutely RNA Microprep Kit (Stratagene). Total RNA 
(500 ng) was reverse transcribed with the Primescript RT kit (Clontech). qRT- 
PCR was performed on a CFX96 Real-Time PCR System (Bio-Rad) using the 
SsoAdvanced SYBR Green Supermix (Bio-Rad, 172-5264). Relative RNA levels 
were calculated from C;, values according to the AC; method and normalized to 
Tbp mRNA levels where applicable. For RNA-seq, total RNA (isolated as above) 
was subjected to ribosomal RNA depletion using the Ribozero kit (Illumina) 
followed by library construction using the ScriptSeq V2 library preparation kit 
(Illumina). 

ATAC-seq. ATAC-seq was performed following a previously published protocol”? 
using 50,000 Adnpt’* or Adnp~/~ mouse ES cells. The experiment was performed 
in biological replicates using two independent isogenic cell lines for each genotype. 
Libraries were paired-end sequenced (2 x 75 bp) using an Illumina NextSeq 500 
device. 

Affinity purification for LC-MS/MS. All affinity purifications in this work were 
performed according to the following protocol with the exception of ADNP?1C718 
affinity purifications (see below). Cells were grown to confluency on 10-cm dishes, 
collected in PBS, and pelleted by centrifugation at 400g for 2 min. All subsequent 
steps were performed on ice or at 4°C. Pellets were resuspended in 3 ml of nuclear 
extract buffer 1 (NEB1;20mM HEPES, 10mM KCI, 1mM EDTA, 0.1 mM Na3VOug, 
0.2% NP-40, 10% glycerol, 1mM DTT and 1x PIC) followed by centrifugation 
at 1,000g for 3 min. Pellets were resuspended in 1 ml NEB1 buffer and incubated 
on ice for 10 min, followed by dounce homogenization. Isolated nuclei were col- 
lected by centrifugation at 1,000g for 15 min, and carefully washed twice with 1 ml 
NEB1 without disturbing the pellet. Pellets were then resuspended in 0.5 ml of 
nuclear extract buffer 2 (NEB2;20mM HEPES, 10mM KCl, 1mM EDTA, 0.1 mM 
Na3VOq, 350mM NaCl, 20% glycerol, 1mM DTT and 1x PIC), dounce homog- 
enized (20 up and down), incubated for 30 min, and finally spun at 16,000g 
for 30 min. Protein concentration was determined using the Bradford assay, and 
approximately 250 1g of nuclear extract was used per affinity purification. The 
protein lysates were adjusted to affinity purification buffer (350 mM or 500mM 
NaCl, 20 mM Tris-HCl, pH 7.5, 0.3% NP-40, 1mM EDTA, 10% glycerol, 1 mM 
DTT and 1x PIC), added to 20,11 anti-Flag-M2 Dynabeads (Sigma), and incu- 
bated overnight rotating at 4°C. Dynabeads were washed the next day in affinity 
purification buffer (4 x 10 min), followed by 3 x 15-min elutions of bound proteins 
with 3 x Flag peptide (final concentration 0.3 mg ml in affinity purification buffer, 
Sigma). Next, elutions were pooled and added to the washed Stepavidin Dynabeads 
(Thermo Fisher), and incubated overnight rotating at 4°C. Streptavidin Dynabeads 
were washed the next day with affinity purification buffer (4 x 10 min), followed 
by a wash with affinity purification buffer without NP40. For single-step affinity 
purification, Flag purification was omitted, and lysates were directly applied to 
the Streptavidin Dynabeads. The enriched proteins were digested directly on the 
Dynabeads with 0.1 mg m1“ trypsin in digestion buffer (50 mM Tris pH 8.0, 1 mM 
CaCl, and 1 mM TCEP). 

For ADNP?'C718 affinity purification, cells from 2 x 15-cm dishes per replicate 

were used; collection and nuclear lysate isolation were as described above. Next, 
400 1g of nuclear lysates were used for single-step purification with Streptavidin 
Dynabeads in affinity purification buffer (2h incubation at 4°C), followed by 
washes (see above). The enriched proteins were digested directly on the Dynabeads 
with 0.2 1g Lys-C in 511 digestion buffer (3 M guanidium chloride, 20mM EPPS, 
pH 8.5, 10mM CAA and 5mM TCEP) for 2h at room temperature. Next, sam- 
ples were diluted with 50mM HEPES, pH 8.5, and digested with 0.2 1g trypsin 
overnight at 37°C. The next day, 0.2 1g fresh trypsin was added, and samples were 
incubated for an additional 5h at 37°C. 
Mass spectrometry. Analysis of affinity purification. The generated peptides (see 
‘Affinity purification for LC-MS/MS) were acidified with TFA to a final concen- 
tration of 0.8% and analysed by LC-MS/MS with an EASY-nLC 1000 using the 
two-column set-up (Thermo Scientific). The peptides were loaded with 0.1% 
formic acid, 2% acetonitrile in HO onto a peptide trap (Acclaim PepMap 100, 
75\um x 2cm, C18, 3j1m, 100 A) at a constant pressure of 80 MPa. Peptides were 
separated, at a flow rate of 150 nl min™! with a linear gradient of 2-6% buffer 
B in buffer A in 3 min followed by an linear increase from 6 to 22% in 40 min, 
22-28% in 9 min, 28-36% in 8 min, 36-80% in 1 min and the column was finally 
washed for 14 min at 80% buffer B in buffer A (buffer A: 0.1% formic acid; buffer 
B: 0.1% formic acid in acetonitrile) on a 50jum x 15cm ES801 C18, 241m, 100A 
column (Thermo Scientific) mounted on a DPV ion source (New Objective) 
connected to a Orbitrap Fusion (Thermo Scientific). The data were acquired 
using 120,000 resolution for the peptide measurements in the Orbitrap and a 
top T (3s) method with HCD fragmentation for each precursor and fragment 
measurement in the ion trap according the recommendation of the manufacturer 
(Thermo Scientific). 
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Protein identification and relative quantification of the proteins was done 
with MaxQuant version 1.5.3.8 using Andromeda as search engine”? and label- 
free quantification (LFQ*’) as described previously*!. The mouse subset of the 
UniProt version 2015_01 combined with the contaminant DB from MaxQuant was 
searched and the protein and peptide FDR values were set to 0.01. All MaxQuant 
parameters can be found in the uploaded parameterfile: mqpar.xml (deposited in 
the PRIDE repository, see Data availability). 

Statistical analysis was done in Perseus (version 1.5.2.6)?**°". Results were fil- 
tered to remove reverse hits, contaminants and peptides found in only one sample. 
Missing values were imputed and potential interactors were determined using t-test 
and visualized by a volcano plot. Significance lines corresponding to a given FDR 
have been determined by a permutation-based method*. Threshold values (FDR) 
were selected between 0.005 and 0.05 and Sp (curve bend) between 0.2 and 2.0 and 
are shown in the corresponding figures. Results were exported from Perseus and 
visualized using statistical computing language R. 
iBAQ. Intensity based absolute quantification (IBAQ) was done as described pre- 
viously*! to evaluate protein abundances in the ChAHP complexes of the different 
pull-down reactions. 

PRM data acquisition. Parallel reaction monitoring (PRM) analyses were 
performed using the same LC-MS system and gradient as described above. 
The acquisition method consisted of acquiring one MS spectrum at 120,000 reso- 
lution from 375 to 1,575 Da followed by 21 PRM spectra. An isolation window of 
1.6 Da, a resolution of 240,000, and an automatic gain control value of 10° was used. 
Fragmentation was performed with a stepped HCD collision energy of 30 +5, and 
MS/MS scans were acquired with a scan range from 110 to 1,800. 

PRM data analysis. The acquired PRM data were processed using Skyline 4. 
The transition selection was systematically verified and adjusted when necessary 
to ensure that no co-eluting contaminant distorted quantification based on traces 
co-elution (retention time) and the correlation between the relative intensities 
of the endogenous fragment ion traces, and their counterparts from the library. 

MASCOT 2.5 was used in the Decoy mode to search the Swissprot mouse ver- 
sion 2015_01 including common contaminants. The enzyme specificity was set to 
trypsin allowing for up to three incomplete cleavage sites. Carbamidomethylation 
of cysteine (+57.0245) was set as a fixed modification, oxidation of methionine 
(+15.9949 Da) and acetylation of the protein N terminus (+42.0106 Da) were set 
as variable modifications. Parent ion mass tolerance was set to 10 p.p.m. and frag- 
ment ion mass tolerance to 0.6 Da. The results were validated with the program 
Scaffold Version 4.4 (Proteome Software). Protein identifications were accepted if 
they could be established at greater than 0.1% FDR rate as calculated in Scaffold. 
SEC of nuclear lysates. Nuclear lysates were isolated as described above 
(‘Affinity purification for LC-MS/MS’ section) from 3 x 10-cm dishes of 
Adnp™as-Avitag/Flag-AviTag ES cells (CMB264). Nuclear lysates were then concen- 
trated to 250,11 final volume using Amicon Ultra 0.5 ml Centrifugical Filters 
(3 kDa, Millipore), and fractionated by SEC on a Superose 6 HR 10/300 resin by 
fast protein liquid chromatography (AKTA; Amersham-Pharmacia Biotech). The 
predicted size exclusion maximum for this resin is 40 MDa, with a void volume 
of 7.35 ml. The column was equilibrated in 2 column volumes of gel filtration 
(GF) buffer (250 mM NaCl, 50mM Tris-HCl pH 7.5, 1mM DTT, 1x PIC) before 
sample loading. A high-molecular-mass protein column standard was used to 
define the column resolution (Sigma). Protein peaks were detected by UV moni- 
toring. Thyroglobulin (669,000 Da) peaked in fractions 9 and 10. Before loading, 
each nuclear lysate was adjusted to the appropriate column conditions and cen- 
trifuged at 100,000g for 30 min. A 200 1] of lysate was loaded onto the column and 
collected into 350-1 fractions; fractions were then subjected to trichloroacetic acid 
(TCA) precipitation for western blot analysis. For TCA precipitation, the sample 
volume was adjusted to 5001] with the GF buffer followed by the addition of 
501 0.15% sodium deoxycholate; tubes were vortexed and incubated at room 
temperature for 10 min. Protein was precipitated by the addition of 25 11 of 100% 
TCA (Sigma), followed by a 20-min incubation at —20°C. Precipitated proteins 
were collected by centrifugation at 10,000g for 10 min at 4°C. Protein pellets 
were washed with acetone and air-dried. The protein pellet was solubilized in 
1x sample buffer (62.5 mM Tris, pH 6.8, 0.72 M 3-mercaptoethanol or 0.1 M DTT, 
10% glycerol, 2% SDS and 0.05% bromophenol blue) and resolved by NuPAGE- 
Novex Bis-Tris 4-12% gradient gels (Invitrogen) and subjected to western blot 
analysis (see ‘Western blotting’ section for further details). 

In vitro biochemistry. For cloning, cDNA encoding full-length human ADNP 
(amino acid residues 1-1102) was PCR amplified with primers NotI-ADNP- 
forward (5’-AAAAAAGCGGCCGCATGTTCCAACTTCCTGTCAACAA-3’) and 
KpnI-ADNP-reverse (5/-AAAAAAGGTACCCTAGGCCTGTTGGCTGCTC-3’) 
and cloned into a pFast-Bac-derived vector (Invitrogen) in frame with an N-terminal 
His¢-tag. Plasmids encoding full-length or N-terminally truncated ADNP 
(amino acid residues 229-1102) with a C-terminal Strep-tag II were generated by 
PCR amplification of ADNP cDNA with primers NotI-ADNP-forward and KpnI- 
ADNP-C-reverse (5’-AAAAAAGGTACCGGCCTGTTGGCTGCTCAGTT-3’) 


1%. 
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or NotI-ADNP(AN228)-forward (5'‘-AAAAAAGCGGCCGCATG 
CCAAAGTCCTATGAAGCTTT-3’) and KpnI-ADNP-C-reverse. The 
amplified cDNA was cloned into a pAC8-derived vector*®. Expression con- 
structs encoding full-length human HP14 (amino acid residues 1-183) were 
generated by amplification of cDNA using primers NotI-CBX3-forward 
(5'/-AAAAAAGCGGCCGCATGGCCTCCAACAAAACTACATT-3’) and Kpnl- 
CBX3-reverse (5’-AAAAAAGGTACCCTATTGAGCTTCATCTTCTGGA-3’) 
and cloning into pFast-Bac-derived vectors in frame with an N-terminal Hisg- 
tag or Strep-tag II. cDNA for individual chromodomain (amino acid resi- 
dues 11-81) or CSD (amino acid residues 109-183) domains of HP1 was 
amplified with primers NotI-CBX3(11)-forward (5‘-AAAAAAGCGGCCGC 
ATGGGAAAAAAACAGAATGGAAAG-3’) and KpnI-CBX3(81)-reverse 
(5'’-AAAAAAGGTACCCTATTTCTGAGAGTTAAGAAACGC-3’) or NotlI- 
CBX3(109)-forward (5’/-AAAAAAGCGGCC GCGATGCTGCTGACAAACCA 
AGAG-3’) and KpnI-CBX3-reverse and cloned into a pAC8-derived vector in 
frame with an N-terminal His,-tag. cDNA encoding for full-length homan CHD4 
(amino acid residues 1-1912) was amplified with primers NotI-CHD4-forward 
(5'-AAAAAAGCGGCCGCATGGCCAGCGGCCTGGGAT-3’) and KpnI- 
CHD4-reverse (5’-AAAAAAGGTACCCTACTGTTGCTGTGCAACCTG-3’). 
The resulting PCR product was cloned into a pAC8-derived vector in frame with 
an N-terminal Hisg-tag. 

In vitro reconstitution of ChAHP. Full-length and truncated versions of ChAHP 
subunits were subcloned into pAC8 or pFastBac-derived vectors*®. The follow- 
ing constructs were generated: human ADNP (amino acid residues 1-1102 or 
229-1102) with a C-terminal Strep-tag II, N-terminally His,-tagged human CHD4 
(isoform 1, residues 1-1912) and N-terminally His-tagged variants of HP1y 
(residues 11-81 or 109-183) were cloned into pAC8-derived vectors. Full-length 
human ADNP (amino acid residues 1-1102) in frame with an N-terminal His,-tag 
and full-length human HP14 (residues 1-183) in frame with an N-terminal Strep- 
tag II were cloned into pFastBac-derived vectors. Baculoviruses were generated in 
Spodoptera frugiperda Sf9 cells using the Bac-to-Bac method for pFastBac-derived 
vectors or by cotransfection with viral DNA for pAC8-based vectors. After one 
round of virus amplification in Sf9 cells, Trichoplusia ni High5 cells were infected 
with the respective Baculovirus (150 1 of virus per 10 ml of High5 cells at a den- 
sity of 2 x 10° cells ml~!) and collected 48h after infection. Cells were lysed by 
sonication in 50 mM Tris, pH 7.5, 300mM NaCl, 5mM (-mercaptoethanol, 0.1% 
Triton X-100, 1mM PMSF, 1x PIC (Sigma-Aldrich). For pull-down experiments, 
cell lysate of a 15 ml culture was added to 301] of Strep-Tactin Sepharose (IBA) 
or 3011 of His-tag purification resin (Roche) and incubated for 1h at 4°C. The 
beads were washed three times with lysis buffer, supplemented with 30 mM imi- 
dazole for histidine pull-down reactions. Proteins were eluted by addition of 
2x sample buffer (62.5 mM Tris-HCl, pH 6.8, 2% SDS, 25% glycerol, 0.05% bro- 
mophenol blue and 5% B-mercaptoethanol) and analysed by SDS-PAGE and 
Coomassie staining. 

For large-scale expression of the ChAHP complex, 11 of Hi5 cells coinfected 
with Baculoviruses encoding for Hisg-tagged ADNP and Strep-tagged HP1y was 
combined with 21 of Hi5 cells expressing His,-tagged CHD4. Cells were lysed in 
lysis buffer and the cleared lysate was passed over a 50-ml Strep-Tactin Sepharose 
(IBA) column. The bound complex was eluted in 50 mM Tris-HCl, pH 7.5, 100 mM 
NaCl, 5mM 8-mercaptoethanol, 2.5 mM desthiobiotin, and bound to an anion- 
exchange chromatography column (Poros HQ) equilibrated in 50mM Tris-HCl, 
pH 7.5, 100mM NaCl, 5mM 8-mercaptoethanol. The bound proteins were eluted 
using a linear NaCl gradient, concentrated and further purified by SEC (HiLoad 
Superdex 200 26/600) in 50mM HEPES-OH, pH 7.4, 150mM NaCl and 0.5mM 
TCEP. Fractions containing the ChAHP complex were concentrated and reinjected 
to a Superdex200 10/300 column equilibrated in the same buffer. 
Computational methods. RNA-seq analysis. All sequencing reads were aligned to 
the December 2011 (mm10) mouse genome assembly from UCSC*”. HP1-mutant 
RNA-seq data were aligned using STAR 2.5.0a with the following settings to allow 
reporting of one randomly chosen alignment per multi-mapping read: ‘-out- 
FilterMultimapNmax 20-outMultimapperOrder Random-outSAMmultNmax 
1-alignSJoverhangMin 8-alignSJDBoverhangMin 1-outFilterMismatchNmax 
999-alignIntronMin 20-alignIntronMax 100000-alignMatesGapMax 100000-out- 
SAMtype BAM SortedByCoordinate’ Aligned and sorted reads were indexed using 
SAMtools (version 1.2). ADNP-mutant RNA-seq data were aligned in Galaxy using 
Bowtie with the parameters “-m 1-best —strata*. Aligned Bam files were imported 
in R using QuasR (1.14.0)°’. BigWig files normalized for sequencing depth were 
generated using the QuasR qExportWig function. Reads were counted over exons 
using the qCount function and collapsed to yield one value per gene. This count 
table was used for differential expression calling with the EdgeR package*”. To 
compare the different Cbx knockout cell lines with Adnp knockouts (Extended 
Data Fig. 7), all biological replicates of the parental/untreated cell lines for Cbx3 
and Adnp were used as control group, whereas the respective knockout replicates 
were considered the treatment group. 


Gene Ontology analysis. Gene Ontology term analysis of upregulated gene sets was 
performed using goana from the R limma package"". For the analysis significantly 
upregulated genes (FDR < 0.01, fold change > 4) from EdgeR output were used. 
Repeat analysis. RNA-seq libraries were mapped to the genome using STAR 2.5.0a 
with settings optimized for maximum repeat recovery/mappability (-outFilterType 
Normal-alignEndsType Local-winAnchorMultimapNmax 5000-seedPerWin- 
dowNmax 1000-alignTranscriptsPerReadNmax 100000-seedNoneLociPerWindow 
100-alignWindowsPerReadNmax 20000-alignTranscriptsPerWindowNmax 
1000-outFilterMultimapNmax 100000-outSAMattributes NH HI NM MD 
AS nM-outMultimapperOrder Random-outSAMmultNmax 1). The resulting 
alignment file was intersected with repeat masker coordinates for mm10 (repeat 
masker 2012-02-07 update, downloaded from UCSC table browser), alignments 
overlapping repeats were counted for all repeat classes and normalized to 1 million 
mapping reads per library. 

ChIP-seq read alignment. ChIP-seq data were aligned in R using the qAlign 
function from the QuasR package*’ with default settings, which calls the Bowtie 
aligner with parameters ‘-m 1 —best -strata’**. Depth-normalized BigWig files 
were generated using QuasR 1.14.0. For H3K9me3, STAR 2.5.0a (-alignIntronMax 
1-alignEndsType EndToEnd-outFilterType Normal-seedSearchStartLmax 30- 
outFilterMultimapNmax 10000-outSA Mattributes NH HI NM MD AS nM-out- 
MultimapperOrder Random-outSAMmultNmax 1-outSAMunmapped Within) 
was used and non-aligning and multiple mappers were filtered out using samtools. 
BigWig files displaying the full length for uniquely mapping reads were generated 
using bedtools and bedGraphToBig Wig (UCSC binary utilities). 

Peak finding. ADNP peaks were called on ChIP replicates using the correspond- 
ing inputs as background (all BAM files from QuasR alignment). MACS (version 
2.1.1.20160309) was run with the default parameters. Peaks detected in at least 
two out of three replicates were kept. 

HP1+ peaks were called on both wild-type and knockout ADNP ChIP replicates 
individually, using the corresponding inputs as background (all BAM files from 
QuasR alignment). MACS was run with the following options:-nomodel-shift 
100-extsize 200. Subsequently, peak lists were intersected using bedtools intersect. 
Peaks present in both wild-type and knockout ADNP datasets, which did not 
contain the top scoring ADNP motif, were defined as ADNP-independent HP1y 
peaks. Note that the number of ADNP-independent HP 11 peaks is an underes- 
timate. HP1 proteins, particularly in heterochromatic regions, often cover large 
domains, similar to H3K9me3. However, for consistency reasons and to compile 
an accurate control group for comparison with the sharp ChAHP peaks, we chose 
to use MACS settings that are optimized for identification of transcription-factor 
like, narrow peaks rather than broad peaks. Hence the peaks called here represent 
a stringent set of the most highly enriched loci in the absence of ADNP. Most of 
the broader domains, which are also ADNP-independent, were not considered, as 
their shape is very different compared to ChAHP peaks. 

Motif finding. HOMER v.4.8 was used with default settings to identify DNA 
sequence motifs in ADNP peaks‘. 

Heat maps and meta-plots. Heat maps and meta-plots were generated from aver- 
aged replicates using the command line version of deepTools2“*. Peak centres were 
calculated based on the peak regions identified by MACS (see above). Big Wig cov- 
erage files for individual replicates were generated by QuasR (see above). For aver- 
aging replicates and for calculating log,(ChIP/input) ratios, bigwigCompare from 
deepTools2 was used. To generate histone modification meta-plots for ChAHP- 
bound loci (Extended Data Fig. 6), we used the following previously published 
datasets: H3K4mel (GSE27841)°, H3K4me2 and H3K27me3 (GSE25532)*°, 
H3K9ac (GSE31284)"”, H3K9me2 (GSE54412)**, H3K9me3 (GSE12241)*. 
ATAC-seq analysis. Paired-end reads were aligned using STAR 2.5.0a using default 
parameters except for—alignIntronMax 1 and-alignEndsType EndToEnd. Only 
uniquely mapping reads (alignment score of 255) were kept for further analysis. 
These uniquely mapping reads were used to generate bigwig genome coverage files 
similar to ChIP-seq. Meta-profiles and heat maps were generated using deeptools2. 
For the meta-profiles, the average fragment count per 10-bp bin was normalized 
to the mean fragment count in the first and last five bins. This ensures that the 
background signal is set to one for all experiments. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Code availability. All custom codes used to analyse data and generate figures are 
available upon reasonable request. 

Data availability. Genome-wide datasets are deposited at the Gene Expression 
Omnibus (GEO) under the accession number GSE97945. The mass spectrometry 
proteomics data have been deposited to the ProteomeXchange Consortium via the 
PRIDE” partner repository with the dataset identifier PXD006226. 
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Extended Data Fig. 1 | Generation of isogenic mouse ES cell lines to 
interrogate protein-protein and protein-chromatin interactions. 

a, Mouse ES cells expressing the BirA biotin ligase from the Rosa26 locus 
were used as a parental cell line for endogenous gene tagging with the 
Flag-AviTag’”. For a full list of mouse ES cell lines used in this study 
(cMB#), see Supplementary Information. b, Top, scheme depicting Flag- 
AviTag (not drawn to scale) insertion at the endogenous Adnp locus. 
Arrow indicates transcription start site. Boxes represent exons. Bottom, 
scheme depicting ADNP protein. Protein domains as predicted by 
InterPro. Nonsense (NS) and frameshift (FS) mutations found in children 
with Helsmoortel-Van der Aa syndrome are indicated (https://www. 


adnpkids.com, dated March 2017). Numbers denote amino acids. c, Sanger 
sequencing of Adnp*!as-AviTag/Flag-AviTag cel] lines. d, Distribution of ADNP- 
bound genomic sites with respect to protein-coding genes. Peaks were 
called from ChIP-seq data acquired from three independent biological 
replicates (that is, three independent Adnp!s-4v'Tag/Flag-AviTag mouse 

ES cell lines). Horizontal bars represent peaks annotated to individual 
categories, and vertical bars represent peaks annotated jointly to specified 
combinations of categories. DIG, distal intergenic; UTR, untranslated 
region. e, ChIP-seq profiles at two lineage-specifying gene loci that were 
generated from three independent Adnp!s-AviTag/Flag-AviTag Es lines, 
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Extended Data Fig. 2 | Generation and analysis of isogenic Adnp 
knockout mouse ES cell lines. a, Scheme depicting CRISPR-Cas9 and 
TALEN-induced double-stranded DNA breaks to delete the Adnp open- 
reading frame. TSS, transcription start site. b, PCR genotyping confirming 
homozygous deletion of the Adnp open-reading frame in three different 
mouse ES cell lines used in this study. The experiment was performed 
twice. c, MA plot comparing fold change (FC) in gene expression for 
Adnp~/~ versus Adnp*/* cells (y axis) with mean mRNA abundance 


15 8 -6 4 -2 0 2 4 6 
logFC eXEN/ESC 


(x axis). Representative endoderm-specific genes are highlighted in red. 
Dashed red lines indicate fourfold up- or downregulation. CPM, counts 
per million. d, Gene Ontology enrichment analysis of genes upregulated 
in Adnp~/~ cells. n=3 independent biological replicates. e, Scatterplot 
comparing gene expression fold change upon Adnp knockout (y axis) with 
expression changes between extraembryonic endoderm (eXEN) and ES 
cells (x axis). Known key lineage markers are indicated in blue. 
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Extended Data Fig. 3 | See next page for caption. 
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Extended Data Fig. 3 | The HP1 interactome of mouse ES cells. 

a, Isogenic mouse ES cell lines expressing endogenously tagged CHD4 
and HP! proteins. Western blot demonstrating expression of FlagAviTag- 
tagged proteins. The high molecular mass of CHD4 (218 kDa) does not 
allow discernable separation of tagged from non-tagged protein. See 
Supplementary Information for detailed genotype descriptions of the 
individual ES cell lines. For gel source data, see Supplementary Fig. 1. 
Experiments were performed twice. b, TAP-LC-MS/MS of endogenously 
FlagAviTag-tagged HP 1a. Protein purification was performed in the 
presence of 350mM NaCl. Parental ES cell line serves as background 
control. n= 3 independent biological replicates (that is, three independent 
Chx5Flag-AviTag/Flag-AviTag mouse ES cell lines). c, TAP-LC-MS/MS of 
endogenously Flag-AviTag-tagged HP1§. Protein purification was 
performed in the presence of 350mM NaCl. Parental ES cell line serves as 
background control. n = 3 independent biological replicates (that is, three 
independent Cbx1"!8-A”'Tag/Flag-AviTag mouse ES cell lines). d, TAP-LC-MS/MS 
of endogenously Flag-AviTag-tagged HP1+. Protein purification was 
performed in the presence of 350mM NaCl. Parental mouse ES cell line 
serves as background control. n = 3 independent biological replicates 
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(that is, three independent Cbx1/!4s-Av'Tag/Flag-AviTag mouse ES cell lines) 

e, Heat map showing the variation in co-purifying (Z-score) proteins 
across HP1 isoform-specific TAP-LC-MS/MS experiments. Proteins 

that were significantly enriched in at least one experiment (b-d) were 
included in the analysis. a-e, Validating our approach, all three HP1 
isoforms co-precipitated a large number of proteins. Many of these were 
common to all three HP! proteins and have previously been described*!. 
We also observed several proteins that interacted uniquely with specific 
isoforms, such as the previously identified CAF-1 or SENP7 interactions 
with HP1a5%. f, Heat map visualization of Pearson’s correlation 
coefficients for the individual HP1 isoform-specific TAP-LC-MS/MS 
experiments. Three independent biological replicates for HPla and HP1+, 
two biological and one technical replicate for HP1, and three technical 
replicates for the parental cell line. g, iBAQ values of ADNP and CHD4 in 
HP1 isoform-specific TAP-LC-MS/MS experiments. Three independent 
biological replicates for HP1a and HP14, two biological and one technical 
replicate for HP1{. Centre value denotes the mean; error bars denote s.d. 
b-g, Statistical analysis was performed using Perseus (see Methods). Mass 
spectrometry raw data are deposited with ProteomeXchange. 
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Extended Data Fig. 4 | See next page for caption. 
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Extended Data Fig. 4 | In vitro characterization of ChAHP complex 
composition. a, Strep-tag pull-down assays with recombinant human 
proteins overexpressed in Hi5 insect cells, revealing that ADNP binds to 
both CHD4 and HP14, whereas CHD4 and HP17 do not interact directly. 
b, SEC of the recombinant ChAHP complex. ChAHP was reconstituted 
from Hi5 insect cells and further purified by separation according to its 
molecular mass on a HiLoad Superdex 300 column. Largest fractions 
eluting first contain ChAHP (1), followed by ADNP-HP14 (2) and 
HP1, alone (3). c, Fractions from b were separated on SDS-PAGE and 
visualized by Coomassie staining. d, Pull-down analysis of Strep-tagged 
HP 1, (S-HP14) with full-length or N-terminally truncated ADNP 
(AN228) or CHD4. e, Pull-down analysis of Strep-tagged full-length or 
N-terminally truncated ADNP. b-e, Note that N-terminally truncated 
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ADNP does not co-elute with CHD4 on SEC (b, c). This is confirmed by 
pull-down experiments (d, e), which show that ADNP lacking the first 228 
amino acids is only able to bind to HP1+4 but no longer to CHD4. Thus, 
ADNP contacts CHD4 through its N terminus. f, Pull-down analysis 

of His-tagged (H) full-length HP14, and isolated chromodomain (CD) 
and chromoshadow domain (CSD). Similar to other proteins containing 
the conserved PXVXL pentapeptide™, ADNP directly interacts with the 
CSD of HP1+. This is consistent with the previously reported interaction 
of ADNP with HP1a!”. The chromodomain of HP1+ does not bind to 
ADNP. Experiments in a-f were performed at least twice. S denotes 

the streptavidin tag added to the respective protein; asterisks denote a 
common contaminating protein in streptavidin pull-down assays. For gel 
source data, see Supplementary Fig. 1. 
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Extended Data Fig. 5 | HP1 occupancy at ChAHP-binding sites. 

a, Subunit occupancy at ChAHP-bound sites displayed as meta-profile 
integrating signal of all peaks. b, Genome browser screen shot of the 
Igfbp4 locus. ChIP-seq tracks represent depth-normalized read counts of 
averaged replicate experiments. The predicted ADNP DNA-binding motif 
upstream of the Igfbp4 transcription start site is shown. c, Average HP 16 
and HP 1 ChIP-seq enrichment on ChAHP-bound sites in wild-type cells, 
and average HP168 and HP1y ChIP-seq enrichment on ChAHP-bound 
sites in Cbx3~/~ and Cbx1~/~ mouse ES cell lines, respectively. n = 2 
biological replicates (that is, independently tagged mouse ES cell lines). 

d, Average HP1a ChIP-seq enrichment on ChAHP-bound sites in wild- 
type and Cbx1~/~Cbx3~/~ double-knockout ES cell lines. 1 = 2 biological 
replicates. e, Histone modifications associated with heterochromatin 

are absent at ChAHP-bound sites. f, Histone modifications associated 


1.0 kb A cDmut replicate 1 @ CDmut replicate 2 


with active transcription are absent at ChAHP-bound sites. e, f, Histone 
modification profiles are displayed as meta-profile integrating signal over 
all peaks. g, Binding of wild-type and chromodomain mutant HP14 to 
ChAHP targets (Igfbp4 and Exd1), an H3K9me3-modified region next 

to an L1 repeat (L1 chr4) and an inactive promoter of an unrelated gene 
(Hoxc5), quantified by ChIP-qPCR. Fold enrichment was normalized to 
an intergenic control region devoid of HP1 and H3K9me3. Wild-type 
(grey) and mutant (red) HP1y constructs were transiently transfected into 
HP1 triple-knockout ES cells in biological duplicates. Note the decrease 
of HP 17 binding at the H3K9me3-modified region (L1 chr4), whereas 
ChAHP targets remain unaffected in the chromodomain mutant (CDmut) 
that can no longer bind to H3K9me3. Black lines indicate average 
enrichments. 
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a 
Table of significant motifs in ADNP peaks 


Rank |Motif P-value _|%of Targets |% of Background 
1 CGCCCSCIFSEIG 16-10538 63.22% 2.80% 
2 GT T CAAT TCCCA 16-6714 55.28% 4.66% 

3 A T ( | GCC (ee ( ( | 16-6478 34.95% 0.94% 

- SS Me. 

4 GCAKCCAEATSS 1e-6104 55.26% 5.56% 
5 GGITAAGAGE 16-5759 56.24% 6.49% 
TSAAGAGCASTSA™ [le 
7 AGATGGCTC A 16-5379 52.92% 6.03% 
8 C AG AC AC ACC 16-5226 29.68% 0.89% 
9 ee SGG 1¢-4742 42.68%! 3.89% 
10 K GAG h 1e-4374 36.99%! 2.90% 
1 CTCT FAAGACT ie 72-4305 33.13% 2.10% 
12| GG RCCTSIe 16-3839 37.99% 3.96% 
13 AG AGA ACC AC 16-3770 46.29% 7.04% 
14 CL AC AGTGT ACT 16-3626 29.57% 2.06% 
15 TIC COSSAG 16-2282 7.74% 
16) TASATA aS ASAAA 1e-1994 3.07% 
*|GCTGCAGA ‘| a 
18 ATTTATTL 1e-1560 5.41% 
19 TCGASESC 1e-1341 6.24% 
20) 4A ALITT TIAA AAA A 1e-853 4.85% 
Al SOC AST RCS AS aa 
22 GI A ACTES AGTI. 16-414 241% 
3.57% 


23 TGICCTS' GA ACT 16-98 


Extended Data Fig. 6 | Motif analysis of ADNP-bound loci. a, ADNP 
DNA-binding motifs predicted by HOMER. Frequency of occurrence 
and P values for motif enrichment compared to genomic background 
are indicated. n = 3 independent cell lines. b, Analysis of co-occurrence 
of the top-ten scoring ADNP DNA motifs. The bar graph shows the 
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b 
Co-occurance of top 10 motifs in peaks containing the 
top scoring motif. 
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number of significant motifs in addition to top scoring DNA motif 


frequency of peaks containing the top-scoring ADNP motif and up to nine 
additional motifs as indicated on the x axis. Note that most peaks besides 
the GCCCCCTGGAG motif also contain more than five other sequence 
motifs out of the top-ten list. 
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Extended Data Fig. 7 | See next page for caption. 
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Extended Data Fig. 7 | Different HP1 isoforms can functionally 
substitute each other. a, Scatterplots comparing mRNA expression 
changes after deletion of Adnp versus single, double or triple deletions 

of Cbx genes measured by RNA-seq. Green trend lines indicate a loess 
(locally weighted scatterplot smoothing) regression. n = 3 biological 
replicates (that is, three independent Adnp ~~ Cbx1~/-, Cbx3~"-, 
Cbx5~/-, Cbx1~/—Cbx3~~ double knockout, or Cbx1~/~Cbx3~/~Cbx5~/— 
triple knockout mouse ES cell lines). b, MA plot displaying fold changes 


in gene expression for individual Cbx knockout cell lines versus wild type. 


x axis denotes the mean mRNA abundance, log,(counts per million); 

y axis denotes the log»(fold change) between knockout and wild 

type. Dashed red lines indicate fourfold up- or downregulation. 

c, UCSC genome browser shots of three lineage-specifying genes. RNA- 
seq profiles normalized by library size of representative wild-type and 
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mutant ES cell lines are shown. Experiments were performed three times. 
d, Gene Ontology enrichment analysis of the genes that are upregulated 
in Cbx1~/~Cbx3~/~Cbx5~/~ triple-knockout (TKO) and Adnp~/~ 
knockout (KO) cells (orange group of genes in a), and of the genes that 
are upregulated in Cbx1~/~Cbx3~/~Cbx5~‘~ triple-knockout but not 
Adnp~’~ knockout cells (grey group of genes in a). See also Supplementary 
Table 5. n = 3 independent cell lines. e, RNA-seq library statistics showing 
fraction of uniquely, multi- and non-mapping reads. Note the increase 

in multi-mappers in the HP1 triple-knockout cells. f, Quantification of 
reads mapping to the major repeat classes in counts per million mappable 
reads. g, Quantification of reads mapping to the different LINE and LTR 
elements in counts per million mappable reads. All mutant cell lines were 
derived from the same parental mouse ES cell line through direct genome 
editing and are therefore isogenic. 
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Extended Data Fig. 8 | A patient-specific nonsense mutation in 

Adnp impairs the interaction with HP1 but not with DNA. a, Scheme 
depicting the wild-type and mutant Adnp alleles, which code for Tyr (blue) 
and a patient-specific premature termination codon (red) at amino acid 
position 718, respectively. Full-length and truncated protein products 
are shown on the right. Arrow indicates transcription start site. Boxes 
represent exons. Numbers denote amino acids. b, N-terminally Flag- 
AviTag-tagged ADNP?T7!8 was streptavidin-purified from cells with 
and without aminoglycoside treatment (gentamycin or paromomycin) 
and subjected to LC-MS/MS analysis. ADNP?!°718_expressing cells were 
treated with 2 mg ml”! gentamycin (2.9mM) or paromomycin (3.2 mM) 
for 24h. The table depicts total spectral counts, unique peptides and 
percentage sequence coverage (derived from Scaffold) for all ChAHP 
components from the different treatments. c, RT-PCR measurement 

of Bmp] and Igfbp4 mRNA levels in ES cells expressing full-length Adnp 
(Adnp*'*) or C-terminally truncated Adnp that interacts with CHD4 but 
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not with HP1 (Adnp??C7/8/PTC718) | 4 — 3 biological replicates (that is, three 
independent RNA isolations). P values were calculated using two-tailed 
unpaired unequal variances t-tests. Centre value denotes the mean; error 
bars denote s.d. d, ChIP-qPCR enrichments for transiently transfected 
Flag-AviTag-tagged wild-type ADNP and ADNP?'°7'8 constructs on two 
ADNP targets, normalized to an intergenic control. Black lines indicate 
means. e, C-terminally Flag-AviTag-tagged ADNP??C7!8 was streptavidin- 
purified from cells with or without gentamycin treatment (2.9 mM) and 
subjected to LC-MS/MS analysis. Bold letters indicate unique peptides 
further quantified by parallel reaction monitoring (PRM). C-terminal 
peptides encoded downstream of PTC718 are shown in colour. Dashed 
box denotes the HP1 interaction motif. f, Summed fragment intensities of 
five C-terminal ADNP peptides that are encoded downstream of PTC718 
are shown on the left. Background proteins shown on the right serve as 
loading controls. Intensities were measured by PRM. Total spectrum 
counts were derived from Scaffold. n = 3 biological replicates. 
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Extended Data Fig. 9 | ChAHP and NuRD are distinct protein 
complexes. a, Single-step purification followed by LC-MS/MS of 
endogenously Flag-AviTag-tagged CHD4 and ADNP. Protein purification 
was performed in the presence of 350 mM NaCl. Proteins that interact 
predominantly with CHD4 or ADNP are indicated by UniProt names. 
NuRD complex components are labelled in green. n = 3 biological 
replicates (that is, three independent Chd4?s-AviTag/Flag-AviTag an 


Adnp*ss-AviTag/Flag-AviTag ES cel] lines). Statistical analysis was done with 
Perseus (see Methods for details). Mass spectrometry raw data are 
deposited with ProteomeXchange. b, SEC of nuclear protein extracts from 
Adnp*s-Avitag/Flag-AviTag BS cells, Each fraction (indicated at the bottom) 
was resolved by SDS-PAGE and immunoblotted with the indicated 
antibodies. Molecular mass of individual proteins is indicated on the left. For 
gel source data, see Supplementary Fig. 1. Experiment was performed twice. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


a > © © od c » iS N 
Si 
es SESS s Se oS & g 
s AS SS 
S vy 6 S fe} i) ve i) re) fe} 
4-OHT = # 4-OHT = + 4-OHT = + 3 + 2 + 2 + 
— \ anti-HP1 —_ ener tp [— owes = so os @& anti-HP1y 
ti-Tubuli : : 
meee eee | anti-Tubulin lll = anti-Tubulin = => anti-HP1pB 
Cbx1* fut 
Cbx5i Cbx3} jo ee we wm ww ww ww | zt bun 
f Cbx3™"' Cbhx1* 
e v g 
1D! > " BD QD dW 
Vv S © DS & SY GP SO © 
e s SN MV 
untreated” 4-OHT treated Ss & & & & ot x RS RS 

I Cbx5 mRNA 7 = 
HB Cbx1 mRNA > ae ep | F146 (M2) — oe @ ) aii?) 
HB Cbx3 mRNA 


Relative mRNA expression of cMB282 vs. 
cotrol normalized to Tbp (log,) 


Cbx5™" Cbx3™ Cbx1* 


Extended Data Fig. 10 | Isogenic Cbx knockout ES cell lines. a, Western 
blot demonstrating depletion of HP1a protein in Cbx5/" mouse ES 

cell line after treatment with 4-hydroxytamoxifen (4-OHT). b, Western 
blot demonstrating depletion of HP16 protein in three independent 
Cbx1~/~ ES cell lines. c, Western blot demonstrating depletion of HP 1+ 
protein in Cbx3!' ES cell line after treatment with 4-OHT. d, Western blot 
demonstrating depletion of HP18 and HP14 proteins in three independent 
Cbx1~/~Cbx3!! double-knockout cell lines after treatment with 4-OHT. 


= a-HPty 


Cbx1 FlaviFiavi Chx3- 


> anti-HP1p 
co 2 Ss = Streptavidin-HRP 
J ee em @ | 20 FLAG 


Cbx3 FiaviFlavi Chx 1 


n=3 independent 4-OHT treatments. e, RT-PCR demonstrating 
depletion of Cbx5, Cbx1 and Cbx3 mRNAs in the Cbx1 ——Cbx3™"Cbx5M 
triple-knockout cell line upon treatment with 4-OHT. f, Western 

blot demonstrating depletion of HP1y protein in three independent 

Cbx 1flas-AviTag/Flag-AviTag cel] lines. g, Western blot demonstrating depletion 
of HP1 protein in three independent Cbx3"!4s-Avitag/Flag-AviTag cel] lines, 
For gel source data shown, see Supplementary Fig. 1. Experiments were 
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Satellite images such as this one, depicting wildfires in Tasmania, Australia, can be used to track and visualize threats to the natural environment. 


Crunch time for data 


Satellite and cloud-computing services are opening worlds of opportunity for researchers. 


BY GABRIEL POPKIN 


S amapriya Roy remembers when it would 


take him up to an hour to download a 
single 1-gigabyte image taken by the 
Landsat Earth-imaging satellites. That was in 
the late 2000s, when he was analysing satellite 
imagery as part of his undergraduate studies at 
Visvesvaraya National Institute of Technology 
in Maharashtra state, India. And the computer 
analysis of a picture could take even longer. 
Sometimes Roy would start the analysis at night 
and it would still be running the next morning. 
Things are very different nowadays. Roy, 
who is a PhD student at Indiana University in 
Bloomington, uses a Google platform to store 
his data and run his algorithms and is able to 
crunch tens of thousands of images in minutes; 
all he needs is a web browser. “It brings everyone 


to a level playing field,’ he says. In addition to 
data from US government sources such as 
Landsat, he uses sharp, detailed images from 
three commercial satellite companies — two of 
which didn't exist when he was an undergrad- 
uate — to research coastal land loss in Louisiana 
and the Amazon region of Brazil. 

In the past few years, technology and 
satellite companies’ offerings to scientists 
have increased dramatically. Thousands of 
researchers now use high-resolution data from 
commercial satellites for their work. Thousands 
more use cloud-computing resources provided 
by big Internet companies to crunch data 
sets that would overwhelm most univer- 
sity computing clusters. Researchers use the 
new capabilities to track and visualize forest 
and coral-reef loss; monitor farm crops to 
boost yields; and predict glacier melt and 


disease outbreaks. Often, they are analysing 
much larger areas than has ever been 
possible — sometimes even encompassing 
the entire globe. Such studies are landing in 
leading journals and grabbing media attention. 
Commercial data and cloud computing are 
not panaceas for all research questions. NASA 
and the European Space Agency carefully cali- 
brate the spectral quality of their imagers and 
test them with particular types of scientific 
analysis in mind, whereas the aim of many com- 
mercial satellites is to take good-quality, high- 
resolution pictures for governments and private 
customers. And no company can compete with 
Landsat’s free, publicly available, 46-year archive 
of images of Earth’s surface. For commercial 
data, scientists must often request images 
of specific regions taken at specific times, 
and agree not to publish raw data. Some 


31 MAY 2018 | VOL 557 | NATURE | 745 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


> companies reserve cloud-computing assets 
for researchers with aligned interests such as 
artificial intelligence or geospatial-data analysis. 
And although companies publicly make some 
funding and other resources available for 
scientists, getting access to commercial data and 
resources often requires personal connections. 
Still, by choosing the right data sources and 
partners, scientists can explore new approaches 
to research problems. 


MAPPING POVERTY 

Joshua Blumenstock, an information scientist 
at the University of California, Berkeley (UCB), 
is always on the hunt for data he can use to map 
wealth and poverty, especially in countries 
that do not conduct regular censuses. “If 
you're trying to design policy or do anything 
to improve living conditions, you generally 
need data to figure out where to go, to figure 
out who to help, even to figure out if the things 
you're doing are making a difference” 

In a 2015 study, he used records from 
mobile-phone companies to map Rwanda’s 
wealth distribution (J. Blumenstock et al. 
Science 350, 1073-1076; 2015). But to track 
wealth distribution worldwide, patching 
together data-sharing agreements with hun- 
dreds of these companies would have been 
impractical. Another potential information 
source — high-resolution commercial satel- 
lite imagery — could have cost him upwards 
of US$10,000 for data from just one country. 

Blumenstock then learnt that Facebook 
had bought commercial satellite images for a 
programme it launched in 2014 to connect the 
global population to the Internet. After chats 
with a Facebook researcher on the project, he 
and the social-networking giant hammered 
out an agreement. Facebook would fund one 
of his graduate students to use the company’s 
technology to study how economic data from 
public surveys correlated with the visual char- 
acteristics of buildings represented in the 
satellite data. Facebook, in turn, could poten- 
tially gain a sharper view of the socio-economic 
characteristics of rural areas, whose residents 
are least likely to have Internet connections. 
(Facebook declined to comment.) 

The arrangement presented some challenges, 
however. Facebook demanded a non-disclosure 
agreement before sharing data. (Blumenstock 
does not have access to personal Facebook 
user data, only to satellite and other aggregated 
data.) And UCB industry-partnership special- 
ists scrutinized the agreement to ensure that it 
wouldn't compromise academic integrity. Pri- 
vacy concerns are likely to loom larger from 
now on. In the wake of allegations in March 
that a UK consultancy had deployed Facebook 
user data for US political purposes, universi- 
ties and companies might be examining their 
agreements more closely. 

Facebook’s command of machine learning 
and cloud computing was also the main draw 
for Robert Chen, a geographer at Columbia 
University in New York City, who collaborates 
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Google’s data centres support its cloud-based Earth Engine platform, which processes geospatial data. 


with the company to study global population 
distribution. Data crunching that would have 
once taken years was completed in weeks, 
enabling Chen and his colleagues to produce 
high-resolution population maps of rural 
areas in 18 countries around the world (see 
go.nature.com/2s1dgq4). “Facebook can pro- 
cess 14.5 billion images in a couple of weeks,” 
he says. The social-media firm’s main goal for 
the project is to provide global Internet access 
(and reach more potential users). Chen aims 
to apply the maps to humanitarian assistance, 
conservation and development planning. 

Other hi-tech goliaths are making resources 
available to researchers. Microsoft's AI for 
Earth, which launched in late 2017, has ena- 
bled more than 60 research groups from more 
than 20 countries to analyse remote-sensing 
data sets from Esri, a mapping and geospatial- 
analysis company in 


Redlands, California, “Around 

using Microscoft’s 20 companies 
artificial intelligence worldwide now 
(AI) algorithms offer orplan 
and computing toofferEarth- 
power. Microsoft’s observing 


chief environmen- 
tal scientist, Lucas 
Joppa, says that AI can supercharge remote- 
sensing research by ferreting out previously 
hidden patterns in data. For example, a team 
including Milind Tambe, a computer scientist 
at the University of Southern California in Los 
Angeles, has used Microsoft algorithms to pre- 
dict wildlife-poaching activity in Africa from 
drone imagery (see go.nature.com/2s2z5ta). 
Researchers apply online for initial access 
to the program. If Joppa and his colleagues 
find a project promising, they collaborate and 
share expertise and in-kind resources, such as 
computing time, to help the research advance. 
Amazon Web Services, the cloud-computing 


capabilities.” 


branch of the e-commerce giant Amazon, 
started hosting the Landsat archive in early 
2015. In September 2016, the company 
launched its Earth on AWS programme, 
through which it hosts around 15 data sets, 
including imagery, weather data from the 
US National Oceanic and Atmospheric 
Administration, and air-quality data from 
the non-profit organization OpenAQ in 
Washington DC. Although anyone can pay 
to analyse the data using Amazon’s com- 
puters, scientists can apply for donations of 
computing time; applications must include a 
description of the research problem and plans 
for dissemination of the results. 

Google now hosts more than 600 public 
satellite, weather, population and other 
Earth and environmental data sets through its 
Earth Engine platform. More than 70,000 users 
— most of them researchers — have created 
free accounts on the platform, says Rebecca 
Moore, Earth Engine’s director of engineering. 

The first global study done on the platform 
yielded a blockbuster paper on maps of forest 
change based on Landsat data; it has racked 
up nearly 3,000 citations in less than 5 years 
(M. C. Hansen et al. Science 342, 850-853; 
2013). Google’s infrastructure jump-started 
the project in 2013 by turning what would 
have been 15 years’ worth of data crunching 
on one computer into a job that took just a few 
days, says Matthew Hansen, a geographer at 
the University of Maryland in College Park 
who led the study. 

The platform has since supported global 
studies of surface water, fish stocks, urban 
agriculture and transport networks, as well 
as smaller-scale studies. For Daniel Weiss, an 
epidemiologist at the University of Oxford, 
UK, who used Earth Engine to map travel time 
from any point on the globe to the nearest city 
(see go.nature.com/2ibwhbm), the platform 
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efficiently crunched a computationally 
expensive algorithm, saving months of work. 
The map itself is now a public resource on 
Earth Engine, and Weiss and his team are 
using it to produce better forecasts of malaria 
outbreaks. 


MORE THAN PRETTY PICTURES 

The growing fleet of satellite companies is 
serving up an increasingly diverse menu 
of data and images. Around 20 companies 
worldwide now offer or plan to offer Earth- 
observing capabilities. These firms, which 
have conventionally served military and 
private-sector clients in finance, agricul- 
ture and other arenas, are increasing their 
overtures to scientists. 

In 2017, satellite company DigitalGlobe 
in Boulder, Colorado, provided scientists 
with high-resolution images worth around 
$6 million through its DigitalGlobe 
Foundation, according to the founda- 
tion’s president, Kumar Navulur. For some 
researchers, the company’s super-sharp 
satellite-borne cameras have enabled previ- 
ously difficult or impossible studies. Sarah 
Parcak, for example, an archaeologist at 
the University of Alabama at Birmingham, 
has used DigitalGlobe imagery to discover 
hidden sites in Egypt and elsewhere, and to 
track looting incidents. 

Satellogic, a company in Buenos Aires 
founded in 2010, has promised to make 
hyperspectral data — information-rich 
imagery derived from light in dozens 
of wavelength bands — available to any 
scientist who wants them. No public satel- 
lite currently collects such data, which many 
scientists prize for its usefulness in applica- 
tions such as detecting drought stress in 
plants and exploring for minerals. The com- 
pany says that it has shared hyperspectral data 
with around two dozen researchers; Roy says 
he got access to some data for his Louisiana 
research after an e-mail exchange. 

The satellite company Planet, based in 
San Francisco, California, images the globe 
daily, the side of each pixel in an image rep- 
resenting between 3 and 5 metres on the 
ground. The company makes data available 
to scientists through its research and educa- 
tion programme, which offers free data for 
up to 10,000 square kilometres a month to 
scientists who apply. 

Institutions can also take out subscriptions 
for larger data volumes. Planet has provided 
imagery to more than 1,600 researchers from 
more than 70 countries, according to Joseph 
Mascaro, the company’s director of academic 
programmes. The company’s frequent images 
enabled Andreas Kaab, a geoscientist at the 
University of Oslo, to track melting glaciers 
in near-real time in Tibet, which showed 
that weather and climate change caused the 
glaciers to suddenly collapse (A. Kaab et al. 
Nature Geosci. 11, 114-120; 2018). In 2016, 
he had warned the Chinese government of 
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an impending avalanche in Tibet on the basis 
of signals he had detected in Planet's images. 

Kaab’s research has benefited not just 
from the imagery itself but also from access 
to company staff, he says. “We typically 
write to Joe [Mascaro] and he connects us to 
someone from the team,” Kaab says. “I feel 
to some extent I am part of the game, part of 
the process.” 

Using commercial data can have down- 
sides. Companies such as DigitalGlobe and 
Satellogic typically take pictures that paying 
customers request, so scientists might find 
that no data are available for their area or time 
of interest. Government restrictions can also 
limit data availability. Mascaro and Navulur 
are prohibited by US law from sharing 
extremely high-resolution imagery of cer- 
tain countries such as Israel, and cannot share 
data with anyone in Iran or North Korea. 
Blumenstock once found that Planet imagery 
he wanted for a project in Afghanistan was 
unavailable owing to an unspecified secu- 
rity concern. Identifying individual people 
or vehicles is impossible, Navulur says; this 
alleviates some privacy concerns, although 
pictures can be sharp enough to make out 
houses and other structures. (Of course, for 
large areas of the world, so is Google Maps’ 
public imagery.) 


KNOW YOUR NEEDS 
Use of commercial images can also be 
restricted. Scientists are free to share or 
publish most government data or data they 
have collected themselves. But they are typi- 
cally limited to publishing only the results 
of studies of commercial data, and at most a 
limited number of illustrative images. 
Many researchers are moving towards a 
hybrid approach, combining public and com- 
mercial data, and running analyses locally or 
in the cloud, depending on need. Weiss still 
uses his tried-and-tested ArcGIS software 
from Esri for studies of small regions, and 
jumps to Earth Engine for global analyses. 
The new offerings herald a shift from an 
era when scientists had to spend much of 
their time gathering and preparing data to 
one in which they’re thinking about how 
to use them. “Data isn’t an issue any more,’ 
says Roy. “The next generation is going to be 
about what kinds of questions are we going 
to be able to ask?” = 


Gabriel Popkin is a freelance writer in 
Mount Rainier, Maryland. 


CORRECTION 

The Careers feature ‘Behind the scenes’ 
(Nature 556, 525-527; 2018) erred in its 
description of the winning photo. Callie 
Veelenturf was measuring pH, conductivity 
and temperature after the turtle had laid 
her eggs, not taking samples beforehand. 


PUBLISHING 
Unequal authorship 


An analysis of more than 10 million 
scientific and medical studies published 
between 2002 and 2018 suggests that 
male authors will continue to outnumber 
female authors for at least the rest of this 
century. The findings, published in PLoS 
Biology, examined articles in science, 
technology, engineering, mathematics 
and medicine (L. Holman et al. PLoS 
Biol. 16, 2004956; 2018). Female authors 
were particularly scarce in the fields of 
physics, computer science, maths and 
surgery. For example, women accounted 
for just 13% of prestigious last-author 
spots in physics studies. That percentage 
has crept upwards by about 0.1% a year 
since 2002, suggesting that if current 
trends hold, authorship in physics studies 
could reach equality in roughly 260 
years. The gender disparity in authorship 
was especially pronounced in papers 
from Japan, Germany and Switzerland, 
whereas the most gender-equitable 
countries were in South America, Africa 
and elsewhere in Europe. “Without novel 
interventions, these fields are likely to 
remain gender-biased for many decades,’ 
says lead author Luke Holman, an 
evolutionary biologist at the University of 
Melbourne in Australia. “Despite recent 
gains, we still have far to go.” 


CONDUCT 


Drop harassers 


An online petition is calling for the US 
National Academy of Sciences (NAS) to 
revoke the membership of scientists who 
commit sexual harassment or assault. The 
move brings fresh attention to a troubling 
issue. As of 23 May, the petition, created by 
neuroscientist BethAnn McLaughlin of the 
Vanderbilt Kennedy Center in Nashville, 
Tennessee, had gained 2,289 signatures. 
“Tm impressed by the response,” 
McLaughlin says. “It speaks to the fact 

that time’s up in academia” Comments 

on the petition point to other non- 
scientific organizations that have revoked 
membership for misconduct. In February, 
the US National Science Foundation 
(NSF) announced that it would require 
universities and institutions to disclose 

the identities of NSF-funded researchers 
who have been disciplined for harassment. 
The NAS takes harassment and assault 
“very seriously’, says Jennifer Walsh, a 
spokesperson for the National Academies 
of Sciences, Engineering, and Medicine. 

A 22 May statement by the presidents of 
each of the three academies says that a 
dialogue has started about the standards of 
professional conduct for membership. 
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eserved. 


Ua SCIENCE FICTION 


BY MIKE ADAMSON 


urder has almost vanished from 
Me spectrum of human existence. 
Not from any elevated moral sense 
or spiritual refinement, but merely because it 
is no longer true that dead men tell no tales. 
The afternoon is bright and warm, 
and the terraces of the glittering ver- 
tical architecture of New Johannes- 
burg are thronged with passers-by, 
robots going about their tasks. I stroll 
in the sun, enjoying the breeze from 
the sea, and the peace seems invio- 
late. I barely remember my original 
name — I must consult my files, 
but at the moment the detail is not 
important. There is someone I must 
meet, but my heart is ambivalent. 
Will I thank this person, or twist his 
head off with the hydraulic power 
hidden beneath svelte, ebon arms? 
Towards the end of the twenty- 
third century, the human race has 
spread across the stars, our vessels 
have touched more than a hundred 
new worlds, and Old Earth is shak- 
ing free of the devastation of its past. 
The population is a fraction ofits old 
levels, and technology has redressed 
many a shortcoming. Among the 
utopic inventions of a race new-born 
to the Universe are the many strategies by 
which death has ceased to be a certainty. 
Death came for me in an accident — the 
failure of a sub-runner, one of the bullet 
trains that streak across the world by mag- 
netic induction in vacuum tunnels. The 
passengers were saved by the many failsafes, 
but I was unlucky enough to be in the way 
of a collapsing stanchion as the mass of the 
passing Cairo—New Jo burg express shook 
plascrete and reinforced alloys to pieces. I 
was told afterwards that there had been no 
time in which to suffer. A quick, clean exit. 
It was only later that I discovered there 
had also been doubt. My body was damaged, 
but not beyond local systems to suspend and 
repair. Eyebrows were raised in some circles, 
reviews promised, but I heard nothing more; 
until a concerned doctor passed me the con- 
fidential finding that the supervising medi- 
cal technician had not pursued the possible 
options but had selected immediate cyber- 
incarnation. 


> NATURE.COM Cyber-incarnation 
Follow Futures: is the standard fall- 
Y @NatureFutures back by law, in which 
EA gonaturecom/mtoodm the personality is 
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Body of evidence. 


downloaded from the chemical matrix of 
the brain and stored, pending reanimation 
in any of the available prostheses — cyborg, 
android, hardlight hologram, whatever the 
individual prefers. Only the rich can afford 
their own new body, of course; the rest of us 
make do with a time-share arrangement, and 


although it’s sometimes inconvenient, there 
are consolations. 

Yesterday, I wasa surfer riding high, glassy 
walls coming down on the ocean beaches; 
the day before, I delighted in music, access- 
ing hardwired routines in the hologram 
I phased with — I had never played piano 
before but certainly want to again. Today, I 
am ina body I would have envied before my 
untimely passing, long and dark, and dressed 
to impress in flowing blues. She draws eyes 
and I revel in the fact that no one knows who 
Iam. Only the dead recognize each other, 
sensing the simulacra we have become, and 
trade knowing smiles as we pass, for we are 
now an elite. 

Isee the one I'm after, taking a seat at an 
outdoor café at the foot of the cliff of glass 
fronting a hotel, its landing pads high above 
throwing cool shade. This is his lunch hour. 

I walk with confidence, the strike of my 
heels almost lost in the sounds of strolling 
people, until I slide into a seat opposite and 
hold his eyes silently. He is not unhand- 
some, fair hair thick above a strong-boned 
skull with firm jaw; indeed his rejuvena- 
tion is superb, retarding his 120 years toa 


comfortable 30, and I sense he appreciates 
this glove of flesh and alloys I wear. “Doctor 
Rensburg,” I begin, a statement. 

A robowaiter hovers nearby. Rensburg 
softly orders tea — for two. 

“You have me at a disadvantage,” he mur- 
murs. “Isnt that how they used to put it?” 

I offer a fine hand before I smile 
and at last recheck — Onika, yes, 
that was my name. “Onika Kabila” 

He frowns, clearly not recalling. 
“Have we met?” He grins now. “I’m 
sure I would remember one so 
charming” 

I flash him the smile his words 
earn, then sit back to stare off at 
the sea as the drone returns with 
fine china and a glass pot. When 
we cradle cups of fragrant blend, I 
have made up my mind. I will not 
kill him. 

Instead I raise my cup in salute. 
“Thank you,” I whisper, my smile 
suddenly very genuine. “You could 
have dropped me back into my 
living body but you prematurely 
reassigned me. I know you earn 
a gratuity from the cybernetics 
manufacturers for every patient you 
send into hard-backup, keeping the 
android industry in full swing, and 
for along time I meant to kill you for 
it” His eyes widen and my electronic senses 
see pulses race, pupils dilate, a dozen other 
tells. “But they would only pour you back 
onto another hard drive, and I’m not so hate- 
filled I would destroy your brain entirely to 
prevent it” My stare holds his eyes like glass 
knives as he pivots upon the uncertainty of 
the instant. “But you know what? I like where 
Iam now. It’s not so different — and better in 
some ways.’ I sip again, then rise, lean across 
the table and speak softly by his ear. “So I'll 
wish you good luck and long life, but hope 
you'll leave the choice to your patients in the 
future.” 

Irise and walk on in the sun, losing myself 
among the crowd on the terraces, and 
know Rensburg is trembling with shock, a 
very human thing of which I am no longer 
capable. But that’s a small price to pay. 

Because I have learnt that life really does 
begin at the end. m 


Mike Adamson holds a PhD in archaeology 
from Flinders University of South Australia. 
After early aspirations in art and writing, 
Mike returned to study and currently 
lectures in anthropology. 
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here are many things that parents would love to pass 

down to their children. Houses, jewellery, money — all 

can be welcome gifts from one generation to the next. 
But not everything that can be bequeathed is so desirable. 
Huntington's disease — a neurodegenerative condition that 
causes uncontrollable movements, emotional disturbance 
and the loss of mental abilities — is an especially unfortunate 
genetic hand-me-down. 

Ata glance, the biology of Huntington's disease seems 
to be simple (see page $36). The condition has been traced 
to a mutation in a single gene on chromosome 4 that is 
responsible for producing a protein called huntingtin. 

And yet fundamental aspects of the molecular and cellular 
processes that underlie Huntington’s disease remain a 
mystery. Glaringly, researchers have still not worked out 
huntingtin’s role in the cell. 

There is no cure on the horizon. But a clinical trial ofa 
potential treatment has raised hopes that such research is 
starting to bear fruit. The innovative treatment comprises 
an antisense oligonucleotide — a molecule that binds to 
messenger RNA to prevent the production of a specific 
protein (S39). Another possibility is to edit the faulty 
gene directly (S42). And as researchers pursue these new 
approaches, they must also wrestle with how to measure 
success without the need to track the health of patients for 
years, or even decades, to come (S46). 

Huntington's disease is unusual in that its diagnosis 
generally occurs well into the child-bearing years, which can 
lead to anguished decisions over whether to take a high- 
stakes genetic gamble with future offspring (S38). But in rare 
instances, the disease also strikes children (S44). 

We are pleased to acknowledge the financial support of 
E Hoffmann-La Roche in producing this Outlook. As always, 
Nature has sole responsibility for all editorial content. 
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HUNTINGTON'S DISEASE 


BIOLOGY | 


Chain of 


mystery 


Decades after uncovering the 
genetic basis of Huntington’s 
disease, researchers remain 
puzzled by the condition’s 
molecular cause. 


HTT exon 1 (green), a shortened form of the protein huntingtin, which is implicated in Huntington’s disease, forms clumps in embryonic rat neurons. 


BY SARAH DEWEERDT 


epitome of a well-understood hereditary 
disease. The condition was comprehensively 
described almost a century and a half ago’. Its 
symptoms are well-characterized: involuntary, 
jerky movements known as chorea; difficulty 
in coordinating voluntary movements; cogni- 
tive impairment; and psychiatric issues such as 
changes in mood. And its pattern of inheritance 
is clear — a person with a parent who is affected 
has a 50% chance of developing the disease. 
Researchers know exactly where to find the 
gene that is implicated in Huntington's disease. 
Known as HTT, and located near the tip of the 
short arm of chromosome 4, it was the first gene 
to be pinned to a chromosome region through 
genetic mapping techniques, in 1983 (ref. 2). 
Ten years later, at the dawn of the genomic era, 
scientists homed in on the sequence of HTT”. 
The mutation that causes the disease is now 
well-understood: an abnormal expansion of 
a repetitive sequence of DNA comprising a 
triplet of bases — cytosine (C), adenine (A) 
and guanine (G) — towards one end of the gene. 
A person who carries a copy of HTT containing 


I: many respects, Huntington's disease is the 
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40 or more of these triplets, or CAG repeats, 
will develop the characteristic symptoms of 
Huntington's disease, typically around the age 
of 45, and then succumb to the condition within 
about two decades of the onset of motor prob- 
lems. A person who has between 36 and 39 such 
copies could develop Huntington's disease, but 
might not. Someone with 35 or fewer copies of 
the CAG repeat will not develop the disease. 

And yet, fundamental aspects of the biology 
that underlies Huntington's disease remain a 
mystery. “Many genetic diseases are relatively 
straightforward at the molecular level,” says 
Ronald Wetzel, a structural biologist at the 
University of Pittsburgh in Pennsylvania. 
“The frustration we have with understanding 
Huntington's disease comes, in part, because 
of this background of our more-positive 
experience with other disease mechanisms.” 

For starters, the function of huntingtin, the 
protein encoded by HTT, isn't fully under- 
stood. More specifically, researchers don’t 
know what the portion affected by the muta- 
tion does. And they aren't sure why the mutant 
protein causes problems in cells, how those 
problems begin, or which of several forms of 
the protein is responsible. 


As a promising potential treatment for 
Huntington's disease — designed to silence the 
expression of HT'T with a synthetic molecule 
known as an antisense oligonucleotide that 
binds to messenger RNA — moves into clinical 
trials (see page S39), answering basic questions 
such as these becomes more important than 
ever. With such progress being made, ignorance 
of the disease’s biological basis should not be 
allowed to create a bottleneck for research. 
Knowing more about how the mutant protein 
leads to damage in nerve cells, in particular, 
could be crucial for the success of treatments 
for the condition. “You need to know how 
to most-selectively target that mRNA,” says 
Matthew Disney, a biochemist at the Scripps 
Research Institute in Jupiter, Florida. 


TRIPLET TROUBLE 

The DNA sequence CAG encodes the amino 
acid glutamine. The CAG repeats in HTT 
therefore lead to the production of a string of 
glutamines, known as a polyglutamine chain, 
which is abnormally long in people with the 
large numbers of repeats that are associated 
with Huntington's disease. But the function of 
the chain, and the reason for the existence of 
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the CAG repeats, is unknown. 

A somewhat provocative explanation, 
proposed by Chiara Zuccatto and colleagues 
at the University of Milan in Italy, is that Hun- 
tington’s disease is the unfortunate by-product 
of an evolutionary advantage. The number of 
CAG repeats in HTT varies among species of 
vertebrate, and their expansion is greatest in 
humans. Huntingtin is essential for the devel- 
opment of the nervous system before birth; 
indeed, the researchers contend that CAG 
repeats might have contributed to the develop- 
ment of the complexity of the vertebrate brain’. 

But other researchers say that CAG repeats 
are just as likely to have arisen by chance, and 
that they have little effect until a threshold 
is crossed. “These sequences are difficult to 
copy;’ says Cynthia McMurray, a biochemist 
at Lawrence Berkeley National Laboratory in 
California, “so sometimes the polymerase [the 
enzyme responsible for copying DNA] stutters 
around, and it can’t get through it” This results 
in extra copies being incorporated into the 
DNA, lengthening the repeat. 

Ten hereditary diseases are known to be 
caused by the expansion of CAG repeats in 
various genes, and other triplet-repeat diseases 
exist — CAG repeats just happen to be particu- 
larly vulnerable to these polymerase ‘stutters. 

It’s not just the purpose of the polyglu- 
tamine chain that is difficult to unpick. The 
function of huntingtin is poorly understood 
and the protein might have multiple roles in 
cells. Although it is expressed most strongly 
in the brain, huntingtin is found throughout 
the body and has been shown to interact with 
more than 100 proteins. “That’s sort of amaz- 
ing,” says Stefan Kochanek, a gene-therapy 
specialist at University Hospital Ulm in 
Germany. “It’s a very large number of proteins.” 

Some experiments suggest that huntingtin 
helps to transport proteins around the cell. 
Other findings hint that it might play a part 
in transcription. It might also aid the proper 
folding of proteins, or help proteins to form 
complexes with one another. “The idea of there 
being one prime mechanism for what hunting- 
tin protein does is far from clear,’ McMurray 
says. “It does a lot of things.” 

Scientists have documented numerous bio- 
chemical abnormalities in animal models of 
Huntington's disease, as well as in people with 
the condition. “There are probably more things 
you can measure that are going off the rails in 
Huntington’s than stay on the rails,” says David 
Housman, a biologist at the Massachusetts 
Institute of Technology in Cambridge, who 
was involved in identifying HTT. The balance 
between protein synthesis and degradation, 
the function of cellular structures such as the 
endoplasmic reticulum, which is involved in 
protein processing, and the cell’s responses 
to stress, for example, become disrupted. But 
Housman says that the cause-and-effect rela- 
tionship between these observations, or what 
goes awry first, is unclear. That makes it dif- 
ficult to glean clues about huntingtin’s normal 


function from observations of what goes wrong 
in Huntington’s disease — as well as to design 
a treatment to target the initial steps in the 
development and progression of the condition. 

An analysis of the structure of huntingtin* 
published in March supports the idea that it is 
some kind of molecular chaperone — a protein 
that aids the function of other proteins — says 
Kochanek, who led the work. It took a decade 
to come to fruition, he adds, because the pro- 
tein’s extreme flexibility made it difficult for 
researchers to get clear images of the structure. 

The analysis does not, however, clearly reveal 
the structure of the polyglutamine chain, so it 
cannot provide further information about 
the function of that part of the protein. Some 
researchers suspect that 
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that it must do some- 

thing, even though it’s not clear what. Poly- 
glutamine chains often facilitate interactions 
between proteins, and mice that lack this por- 
tion of HTT show behavioural and biochemical 
changes. So the question remains: how exactly 
does mutant huntingtin cause such problems? 


STICKY CHAINS AND CLUMPS 

Scientists have debated whether the damage to 
cells in Huntington’s disease stems from a loss 
of the normal function of huntingtin, a toxic 
effect that is unique to the mutant protein, or 
a combination of both. There’s also disagree- 
ment on which form of the mutant protein 
is the culprit — a question that is difficult to 
answer because many versions of huntingtin 
exist in the cell at the same time, owing to 
splicing and modification processes. “It’s 
like having too many viable suspects in an 
Agatha Christie novel,” Wetzel says. 

Some researchers think that full-length 
huntingtin is the prime suspect. Chains of glu- 
tamine tend to be sticky, McMurray says, and 
huntingtin containing too many glutamines 
in a row might adhere more strongly to other 
molecules than it would normally — gumming 
up the movement of proteins through the cell. 
She suggests that the resulting impairment of 
huntingtin’s transportation activity would dis- 
rupt metabolism and other cellular functions 
gradually, which is consistent with the slow 
progression of Huntington's disease. 

But many others instead point to a shortened 
form of huntingtin, known as HTT exon 1, 
in which exon 1 refers to the protein-coding 
region of HTT that contains the CAG repeats. 
It’s found only in people with Huntington’s 
disease and contains sequences that are not 
usually translated into protein. HTT exon 1 
could be more toxic than any other form of 
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huntingtin. For example, mouse models in 
which only HTT exon 1 is expressed show 
all the main features of Huntington's disease’. 
Meanwhile, studies in fruit flies suggest that of 
all huntingtin fragments produced by the cell, 
HTT exon 1 is the most harmful’. 

If HTT exon 1 is the real cause of Huntington's 
disease, exactly how it exerts its damaging effect 
remains unclear. That’s because, compared with 
flexible full-length huntingtin, HTT exon 1 is 
a downright shape-shifter. It switches freely 
and rapidly between multiple conformations, 
making it almost impossible to isolate the effects 
of any particular one. 

Researchers have several hypotheses. 
One idea is that, similar to how full-length 
huntingtin containing many glutamines might 
gum up cells, the longer the polyglutamine 
chain of HTT exon 1 is, the more strongly the 
fragment can bind to other molecules. Other 
scientists blame the tendency of HTT exon 1 to 
aggregate with itself. Nerve cells from people 
with Huntington's disease contain clumps of 
huntingtin-related proteins, as well as smaller 
fibrils comprising thousands of copies of 
HTT exon 1. In turn, such aggregations can 
disrupt a variety of cellular functions, leading 
inexorably to neurodegeneration. 

Similar clumps are found in other 
neurodegenerative conditions such as 
Alzheimer’s disease, Parkinson's disease and 
amyotrophic lateral sclerosis. They also occur 
in all known diseases caused by expanded 
CAG repeats, including spinocerebellar ataxia. 

Wetzel and his collaborators have shown 
that polyglutamine chains acquire a much 
greater propensity to stick to each other and 
form aggregates when they reach 37 amino 
acids long. This observation provides a poten- 
tial link between the disease-causing threshold 
of the CAG repeats and the biochemistry of 
huntingtin. Still, others caution that huntingtin 
clumps might be the result, not the trig- 
ger, of the processes responsible for the slow 
progression of Huntington's disease. 

It’s tempting to think that such debates won't 
matter if treatments such as the antisense 
oligonucleotide in clinical testing can simply 
turn off huntingtin expression altogether. But 
some of the antisense molecules being tested 
target a portion of HTT that is distant from 
exon 1, Disney says. “Those antisense oligo- 
nucleotides are going to ablate full-length 
huntingtin, but they are not going to affect this 
mini version of huntingtin,” he says. Perhaps 
it’s a hint that huntingtin might be too complex 
for such simple solutions. m 


Sarah DeWeerdt is a freelance science writer 
in Seattle, Washington. 
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GENETIC TESTING 


Mark Newnham chose to discover whether he carries the gene mutation for Huntington’s disease. 


Darkness and light 


The confirmation that a person will develop Huntington’s 
disease can bring them more uncertainty — but also relief. 


BY SIMON ROACH 


ark Newnham has seen the future, 
Mi it’s etched on his father’s face. 

Despite being in good health, the 
31-year-old knows that Huntington’s disease 
is coming — he just doesn't know when. 

Like many people living under the shadow of 
the condition, Newnham, who lives in London, 
first heard about Huntington's disease when it 
struck older members of his family. A great- 
uncle had been diagnosed with it at the end of 
his life. So when Newnham’s father started to 
develop the involuntary movements associated 
with the condition, he got tested for the gene 
mutation responsible. His father’s diagnosis 
meant that Newnham — who was 20 years old 
at the time — had a 50% chance of carrying 
the gene. “I didn’t know what Huntington’s 
disease was when my Dad told me that he had 
it” Newnham says. 

In the ten years since, his father’s symptoms 
have progressed to include more severe involun- 
tary movements, memory difficulties and mood 
swings. Later, his driving became worryingly 
erratic. Thinking about those years, in which 
his father’s mental health began to decline, is 
painful, Newnham says — he feels as though 
he has been witnessing someone “at war with 
himself, every day”. 

Throughout his early twenties, and despite 
his father’s illness showing him what might 
await, Newnham did not want to take the 
genetic test that would reveal whether he 
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had inherited the mutation for Huntington's 
disease. “I was more of a free spirit,” he says. 
“T thought, ‘I don't need to know: I can get on 
with it and just see if it happens later on in life” 

That attitude reflected his general approach 
to life. As an actor and musician, he launched 
from one project to the next with little thought 
about what would come later. “I wanted to 
headline the Glastonbury Festival, and I 
wanted to become the next Johnny Depp,’ he 
laughs. “Those were my goals.” 

He continued on that path, he says, until he 
met his partner. Finding happiness and stabil- 
ity changed his perspective on Huntington's 
disease — especially when the couple thought 
about having children. Could he face rolling 
the dice when he might pass the condition to 
his offspring? 

Newnham sought genetic counselling 
through the UK National Health Service, during 
which he explored the impact that testing could 
have on his life. This involved considering his 
motivation for being tested, as well as changes 
that he might need to make in the event that 
he did have the mutation. Aside from the 
emotional strain that such testing can bring, 
it also raises questions about physical care and 
finances; the certainty of knowing you have the 
mutation can make it more difficult to get long- 
term health or life insurance. Genetic counsel- 
lors can help those who might be affected to 
pick through the entangled pros and cons. 

After three sessions, and given his and his 
partner’s desire to have children, Newnham 


concluded that he needed to know his status 
with respect to Huntington's disease. “We didn't 
want to have a child without that certainty,’ he 
says. The result was not what he had hoped for. 
Like his father, he carries the mutated gene. 

Newnham is yet to experience symptoms, 
and it could be decades before he shows signs 
of the disease. He still works as an actor and 
musician, but says that his priorities have 
changed. “The test results made me realize that 
what really drove me as a person before, and 
what ambitions I had, they’re not as impor- 
tant now,’ he says. The dream of performing 
at Glastonbury will never be gone, but spend- 
ing time with family and friends seems more 
important. This shift in perspective has given 
him a quiet contentment, he adds. 

More willing to look ahead, Newnham and 
his partner immediately began to explore how 
they could have a child who would not carry 
the Huntington's disease mutation. 

“IT wanted to make sure that I don't pass this 
on to the next generation,’ he says. That meant 
going through a process called preimplanta- 
tion genetic diagnosis (PGD). 

In PGD, embryos created through in vitro 
fertilization (IVF) are screened for specific 
genetic disorders; only those without the 
related mutations are implanted. In England, 
up to three rounds of PGD are available at no 
cost to people who meet certain criteria. (In 
the United States, many health-insurance plans 
wont cover the process, so people typically pay 
US$15,000-25,000 for IVF with PGD.) 

Not everyone with a family history of 
Huntington's disease goes to these lengths. 
Some leave it to chance. And various religious 
groups have reservations about prenatal 
genetic-screening methods such as PGD 
because any embryos found to have genetic 
abnormalities will be destroyed. 

For those who do elect for PGD, the chances 
of success are low — as Newnham and his 
partner found out when they received the 
news that their journey towards parenthood 
had, for now, come to an end. The IVF part 
of the process, itself a complicated procedure, 
had failed and there were no embryos to test. 

For now, the couple are weighing up the 
options. Adoption is a possibility, but people 
who will go on to develop conditions such as 
Huntington’s disease tend to be at the bottom 
of the list because of their own care needs later 
in life, Newnham says. 

Despite the prospect of a life without 
children, Newnham does not regret his 
decision to get tested. At least, he explains, he 
is moving forward with his eyes open. And 
advances in research fill him with “immense 
hope” that some form of treatment will be 
available in his lifetime — too late for his 
father, perhaps, but soon enough that the risk 
of having a child with Huntington’s disease 
might no longer be one of life or death. m 


Simon Roach is a freelance writer in Glasgow, 
UK. 
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The big hope for Huntington’s 


A quarter of acentury after its discovery, researchers are finally unlocking ways to 
neutralize the gene behind Huntington’s disease. 


BY LIAM DREW 


European Huntington's Disease Network, 

Sarah Tabrizi announced the launch ofa 
drug trial. Tabrizi, a neurologist and director 
of the University College London Huntington's 
Disease Centre, would be working with Ionis 
Pharmaceuticals of Carlsbad, California, to 
test the safety and tolerability of a drug candi- 
date called IONIS-HTT,, in people for the first 
time. The drug had been designed to reduce 
the amount of protein being made by the gene 
that causes Huntington's disease. 

That gene is huntingtin (HTT). Inheriting 
just one mutated copy brings about a progres- 
sive neurodegeneration that typically begins ina 
person's forties. The condition’s most distinctive 
symptom is involuntary, jerky limb movements. 
This is preceded by subtler psychiatric symp- 
toms and followed by a disabling dementia. 

IONIS-HTT,,. is an antisense oligonucleotide 
(ASO): an artificial chain of 12-25 nucleotides 
that is designed to prevent the production of 
protein froma specific gene. In the trial, people 
in the initial stages of Huntington's disease 
would receive four monthly injections of ASO 
directly into their cerebrospinal fluid (CSF) 
through a lumbar puncture. IONIS-HTT,, 
was expected to diffuse into the brain, where it 
would suppress the production of the protein 
huntingtin in neurons. As well as ensuring that 
the drug had no adverse effects, Tabrizi and 
Ionis would use a new assay to measure levels 
of mutant huntingtin. 

The Huntington's disease community was 
excited about the trial — a way of silencing 
HTT has been sought since the gene was dis- 
covered in 1993. But the path to using ASOs 
as treatments had been rocky and the brain 
is notoriously difficult to target with drugs. 
Tabrizi recalls that after her presentation, col- 
leagues told her, “Tt’ll never reach the brain. It’s 
never going to work.” 

In December 2017, however, a press 
release revealed that the doubters had been 
wrong — the trial had been a success. And in 
March 2018, Tabrizi unveiled the resulting data 
at the final session of the 13th Annual Hun- 
tington’s Disease Therapeutics Conference. 

Her most important slide showed decreases 
in the level of mutant huntingtin in trial par- 
ticipants’ CSF — indicative of reduced levels 
of the toxic protein in their brains — that were 
proportional to the amounts of drug the vol- 
unteers had received. At the two highest doses, 
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Sarah Tabrizi is helping to move innovative potential treatments for Huntington’s disease into the clinic. 


production of the protein had, on average, 
decreased by about 40%. 

“People started crying,” says Jeff Carroll, a 
neuroscientist who investigates Huntington's 
disease at Western Washington University in 
Bellingham. “Everybody who works in Hun- 
tington’s disease long enough meets families and 

gets to know them, so it 


“It'll never becomes very personal? 
reach the Carroll’s connection 
brain. It’s to his work runs particu- 
never going to __ larly deep. He began his 
work.” career in neuroscience 


after his mother was 
diagnosed with Huntington's disease. Then, in 
2003, he discovered that he, too, had the muta- 
tion for the condition. Looking at Tabrizi’s slide, 
Carroll thought, “This is a graph that is chang- 
ing my life” 

Tabrizi emphasizes that the trial did not 
show that IONIS-HTT,, is able to treat Hun- 
tington’s disease. It demonstrated only that the 
drug was safe and well-tolerated, and — cru- 
cially — that it engaged its target in the brain. 
“What we now have to do,’ she says, “is to move 
quickly forward to larger, longer trials to test 
whether the drug slows disease progression.” 

These trials will be run by pharmaceuti- 
cal company Roche of Basel, Switzerland. 


In 2013, it partnered with Ionis to develop 
IONIS-HTT,, and, after the initial trial, Roche 
acquired the drug for US$45 million. The com- 
panies will continue to collaborate. If all goes 
to plan, a large phase III trial of IONIS-HTT,, 
will commence later in 2018. 

Assessing where the project stands at present, 
Tabrizi says she has become fond of a quote 
froma speech by Winston Churchill: “Now, this 
is not the end. It is not even the beginning of the 
end. Butit is, perhaps, the end of the beginning” 


THE BEGINNING OF THE BEGINNING 

The molecular basis of ASO technology is the 
stuff of secondary-school textbooks. In double- 
stranded nucleic acids, the base guanine binds 
to cytosine, and thymine (in DNA) or uracil (in 
RNA) binds to adenine. This pairing enables 
DNA both to replicate and to supply cells with 
instructions for making protein. 

ASOs are designed to be complementary to 
the messenger RNAs of specific genes, which 
act as templates for protein production. When 
a cell is flooded with a particular ASO, the ASO 
will bind to its target mRNA, preventing it 
from guiding protein synthesis. 

Forty years ago, researchers at Harvard 
University in Cambridge, Massachusetts, 
demonstrated' that an ASO made of DNA 
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could stop the replication ofa virus by blocking 
viral protein production. And a year later, in 
1979, it was shown’ that the binding of ASOs to 
mRNA not only prevented protein synthesis, 
but also triggered the degradation of mRNA. 

Given that ASOs could, in principle, sup- 
press the expression of any gene, these find- 
ings raised hopes of a fresh approach to the 
treatment of many diseases. And conditions 
such as Huntington's, caused by faulty genes 
that produce proteins with toxic effects, were 
seen as particularly promising targets. 

There was, however, a considerable prob- 
lem: DNA makes a lousy drug. A good drug 
tends to distribute 


evenly through the “People were 
body, so that suffi- 58 enthralled 
cient amounts reach Dy genetics. 
the desired target. To They thought 
achieve this, the drug if you had the 
must persist forasub- gene, youhad 
stantialamountoftime. the cure.” 


On this score, ASOs 

face a major problem when given to mam- 
mals, which produce high levels of enzymes 
called nucleases that digest nucleic acids. 

Also, to be effective, ASOs must bind to tar- 
get mRNAs both tightly and with specificity. 
Complementary base pairing means that ASOs 
have preferred-partner mRNAs. However, 
because they are highly charged molecules, 
ASOs can also bind to mRNAs to which they 
are not perfectly complementary — thereby 
affecting the expression of other genes — as 
well as to proteins. Both events could give rise 
to off-target effects and toxicity issues. 

Making ASO-based treatments a reality 
has required the creation of molecules that 
retained nucleic acid’s property of complemen- 
tary base pairing, while otherwise overhauling 
ASO chemistry. 

Ionis has been a central player in this 
pursuit since its foundation in 1989. Frank 
Bennett, vice-president of research, set up 
the company’s laboratories and later led the 
development of IONIS-HTT,,. He stresses 
that Ionis was “founded on an idea for a tech- 
nology”. Unlike most start-ups, it did not 
license technology from elsewhere. And the 
technology that it, like other companies devel- 
oping ASOs, has produced has undergone a 
substantial evolution. 

Initially, the company focused on treat- 
ments for viral infections and cancer. It used 
first-generation ASOs, the backbones of 
which had already been chemically modified 
to differ from those of DNA, which fraction- 
ally increased their resistance to nucleases. 
In 1998, Ionis’s drug fomivirsen (Vitravene) 
became the first ASO to be 


approved by the US Food _ Jeffrey Carroll’s 
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But after advances in treating HIV infection 
that maintained the immune response, the 
drug fell out of use. Other early ASOs were 
clinical failures. In the mid-1990s, several 
companies who had invested heavily in ASO 
technology withdrew from the field. 

The frustration extended to basic research. 
The first published report of an attempt to 
suppress the production of huntingtin in mice 
using ASOs described a failed experiment’. 
That study’s lead researcher, Ole Isacson, who 
works on neurodegeneration at the Harvard 
Stem Cell Institute in Cambridge, recalls the 
conflicted time that followed the description 
of HTT in 1993. “People were so enthralled by 
genetics,” he says. “They thought if you had the 
gene, you had the cure. Those of us with direct 
experience of working on disease models didn't 
get that feeling” 

Bennett concurs. The discovery of HTT 
piqued his interest in Huntington’s disease, 
but “at the time, we weren't ready”, he says. 
Instead, Ionis focused on improving the sta- 
bility of ASOs. Second-generation ASOs were 
developed in the late 1990s and early 2000s by 
incorporating further chemical modifications 
that increased resistance to digestion by nucle- 
ases. Isacson says that the enhanced stability 
of present ASOs, which can act for months, 
compared with the short-lived molecules that 
he used in 1997, is “one of the most remarkable 
improvements in technology that I’ve seen”. 
Only after Ionis had developed such ASOs did 
Bennett begin to tackle genetic disorders affect- 
ing the brain. 

To do so, Ionis forged a collaboration, in 2003, 
with Don Cleveland, a neuroscientist at the 
University of California, San Diego, that aimed 
to treat a genetic form of amyotrophic lateral 
sclerosis, or motor-neuron disease. Then, in 
2006, buoyed by progress they had made using 
mouse models of that condition, Ionis and 
Cleveland began to work on Huntington’s dis- 
ease. In 2012, they published studies* showing 
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ASOs that target HTT mRNA could reverse 
Huntington’s-disease-like symptoms in mouse 
models of the condition, alongside a demonstra- 
tion that ASOs downregulate huntingtin 
production in the brains of rhesus macaques. 
After this proof-of-concept work, Ionis 
designed and validated an ASO that would 
work in people, established the best way to 
deliver it to the brain — optimizing the lum- 
bar-puncture procedure, for example — and 
then determined the parts of the brain that 
the drug was most likely to reach. Finally, 
with Tabrizi, Roche and the CHDI Founda- 
tion (a US non-profit organization that funds 
research on Huntington's disease), it developed 
the assay’ to track levels of mutant huntingtin. 


THE END OF THE BEGINNING 

The optimism that surrounds IONIS-HTT,, 
stems from the well-established link between 
reduced levels of mutant huntingtin and 
improvements in symptoms in animal models 
of Huntington's disease. Yet some researchers 
are concerned that IONIS-HTT,, suppresses 
not only the production of mutant huntingtin, 
but also synthesis of the normal protein. 

This is because selectively targeting 
mRNA from the mutated copy of HTT 
— leaving mRNA from the normal copy 
untouched — poses a huge technical challenge. 
The mutation that causes Huntington's disease 
is an overlong run of the nucleotide triplet 
CAG (see page $36): normal HTT contains 
17-35 consecutive such triplets, whereas in 
people with the condition, at least one copy of 
the gene has 36 or more ina row. Consequently, 
an ASO of around 20 nucleotides that targets 
CAG repeats would bind to both normal and 
disease-causing versions of HTT mRNA. And, 
problematically, more than 50 other human 
genes also contain 10 or more CAG repeats, 
which means that targeting the sequence could 
produce unwanted side effects. 

An alternative approach to targeting HTT 
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mRNA is being developed by Wave Life Sciences 
in Cambridge, Massachusetts. Its strategy takes 
advantage of functionally insignificant differ- 
ences that can often be found between a per- 
son’s two copies of HTT. If their disease-causing 
gene differs from their normal copy by a single 
nucleotide — an A, for example, instead of a 
C—an ASO can be designed to target only the 
HTT mRNA containing the substituted nucleo- 
tide, a mutation known as a single nucleotide 
polymorphism (SNP). “We could make those 
SNPs into therapeutic targets for drugs,’ says 
Paul Bolno, Wave's chief executive. 

Wave was founded on an innovative means 
of manufacturing ASOs, in which the intrin- 
sic symmetry — or ‘handedness’ — of each 
nucleotide is specified during ASO synthesis. 
But the company’s SNP-targeting approach to 
Huntington's disease also requires technical 
innovation in genotyping. If a physician were 
to prescribe a drug that suppresses a gene on 
the basis of it containing a particular SNP, he or 
she would need to be certain that the SNP is in 
the patient's disease-causing version: inadvert- 
ently suppressing the normal copy while leav- 
ing mutated HTT unaffected could accelerate 
the disease. In conventional gene sequencing, 
DNA from a person's two copies of a gene is 
combined — it’s possible to discover which 
mutations the person has, but not on which 
chromosome (of the pair) they are found. To 
determine that in Huntington's disease, the 
sequencing reaction must follow the same 
strand of DNA from the region of CAG repeats 
to the SNP that is being used to differentiate 
between versions of the gene. Wave says that 
its sequencing platform does exactly this. But 
to meet regulatory approval, the error rate will 
have to be essentially zero. 

The company has now begun separate 
phase I trials of two ASOs that each target one 
of the two most common SNPs in mutated 
HTT. The approach represents a personalized 
route to treating Huntington’s disease — only 
people with the targeted SNPs can benefit. 
Unfortunately, about 30% of those with the 
condition have neither SNP. Bolno says that 
Wave is looking for further SNP targets. 

The need for specificity is contentious. 
Bolno points to studies in mice, in which 
switching off the production of normal hun- 
tingtin causes deleterious effects, as evidence 
that suppressing both forms of huntingtin in 
people might have unwanted consequences. 

Ionis and Tabrizi, however, disagree. 
Although studies in mice show that normal 
huntingtin is crucial for early development, 
they emphasize that, in adult animals, its func- 
tion is much less important. Besides, they point 
out, ASOs do not reduce levels of huntingtin to 
zero. They cite other studies in mice in which 
lowering but not totally removing huntingtin 
had no adverse effects. What’s more, they say, 
rhesus macaques given IONIS-HTTRx for up 
to nine months showed no detrimental effects. 

Both Ionis and Wave are working with 
Carroll to resolve this pivotal issue. Carroll 
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RNA INTERFERENCE 


Another way to halt huntingtin 


Antisense oligonucleotides (ASOs) are not 
the only way to suppress protein synthesis 
by targeting messenger RNAs. Cells already 
have an intrinsic mechanism for stifling 
mRNA that involves producing short 
molecules of RNA that are complementary 
to mRNAs. When a short RNA binds to 

an mRNA, protein synthesis is prevented 
through a phenomenon known as RNA 
interference (RNAi). 

Most approaches to harnessing RNAi 
involve introducing a gene into cells that 
then constantly produces a desired short 
RNA to permanently suppress synthesis of 
a target protein. Three companies — Spark 
Therapeutics in Philadelphia, Pennsylvania; 
UniQure in Lexington, Massachusetts; 
and Voyager Therapeutics in Cambridge, 
Massachusetts — are pursuing RNAi 
approaches to treating Huntington’s 
disease. 

Each is developing a virus-based 
vector — which is incapable of replicating 
and therefore poses no threat to health — 
that delivers a gene encoding a short RNA 
that inhibits huntingtin production. However, 
this one-off genetic treatment raises 
potential safety concerns. Permanently 


despairingly cites two almost identical studies” 
in mice that gave radically different results. 
“Right now, we don't know enough,” he says. 
“And you can only get so much from mice.” 

Another question about the long-term 
future of ASOs concerns the plausibility of giv- 
ing lumbar punctures to patients on a regular 
basis, potentially, for decades. ASOs that can 
cross the blood-brain barrier, which do not 
need to be introduced directly into the CSF, 
are still in early development. And there are 
methods other than a lumbar puncture for 
getting drugs into the CSF: some people with 
multiple sclerosis, for example, use implanted 
pumps for the task. Tabrizi says that finding an 
alternative delivery system is a “post-approval 
problem” — given that no current treatment 
stops the condition, she notes, people with 
Huntington's disease accept that they'll need to 
visit a hospital regularly to receive treatment. 

Other potential genetic treatments hold the 
advantage of having to be administered only 
once. At the moment, the most viable such 
alternative is RNA interference (see ‘Another 
way to halt huntingtin’). Looking further 
ahead, the gene-editing tool CRISPR could 
correct HTT directly (S42). But although the 
enthusiasm that surrounds this technique is 
valid, the history of ASOs shows that much 
work is needed to move exciting ideas into 
the clinic. 

“ASOs are reaching prime time,’ says Tabrizi, 


modifying tissues to produce RNA might 
reduce the need for repeated dosing, but the 
procedure cannot be reversed if side effects 
develop. And, unlike ASOs, which need 

only to be introduced to the cerebrospinal 
fluid, these vectors must be administered 

in the vicinity of the cells to be treated. In 
Huntington’s disease, researchers have 
focused on the striatum — an area in the 
middle of the brain that is most-visibly 
affected by the condition. 

Controversy exists as to whether 
suppressing huntingtin in the striatum 
alone will halt Huntington’s disease. This 
brain structure has an important role in 
the condition’s progression, and Pedro 
Gonzalez-Alegre, a neurologist at the 
University of Pennsylvania’s Perelman 
School of Medicine in Philadelphia, thinks 
that “improving one key brain region will 
have benefits beyond that area itself”. But 
ultimately, he concedes, “we will answer this 
question only when we do it in humans.” 
Perhaps, he suggests, the strong RNA-based 
suppression of huntingtin in the striatum 
might be supplemented with ASOs to more 
modestly suppress the protein throughout 
the brain and body. L.D. 


for Huntington's disease and, potentially, 
brain diseases in general. She can now smile 
about the doubters that she encountered on 
announcing the trial. “This is science,’ she says. 
“When you're trying to develop new therapies, 
youre always going to have sceptics, and you 
have to just carry on with what you believe in. 
And I believed in this.” 

Carroll, like many other researchers and 
people who have been touched by Hunting- 
ton’s disease, has been riding a fresh wave of 
hope since Tabrizi revealed results of the ASO 
trialin March. “Ever since I got my diagnosis,” 
he says, “I have operated under the assump- 
tion that I'll die at the same time my mum 
did — that I'll get sick when she did. Seeing 
that graph was the first time I’ve believed that 
it could be better.” m 


Liam Drew is a science writer based in 
London. 
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Beverly Davidson and Alex Monteys are using gene editing to inactivate the mutated gene huntingtin. 


To cut is to cure 


One-off treatments that target the brain in Huntington’s 
disease must meet strict safety and efficacy requirements. 


BY MICHAEL EISENSTEIN 


the hype of gene therapy, having spent 

decades working in the field. During that 
time, she has grappled with the harsh realities of 
turning flashy and potentially transformational 
technologies into clinical applications. But when 
she heard about the genome-editing technol- 
ogy CRISPR, she was instantly intrigued. “As 
soon as those first papers came out, we started 
playing with it,” says Davidson, a specialist in 
neurodegenerative disease at the Children’s 
Hospital of Philadelphia in Pennsylvania. 

Like most other neurological disorders, 
Huntington's disease has proved to bea costly 
and frustrating target for drug developers. But it 
also has distinctive features that make it a good 
match for treatments that target genes. It arises 
from a mutation in a single gene that encodes 
the protein huntingtin, and a disease-causing 
copy of the gene can be readily distinguished 
from anormal copy by the presence of an over- 
long stretch of a repeated triplet of nucleotides, 
CAG (see page S36). Before turning to CRISPR, 
Davidson and her colleagues had some success 


Be Davidson is well insulated from 
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in treating animal models of Huntington's 
disease with RNA interference (RNAi), which 
uses synthetic molecules of RNA to prevent the 
production of mutant huntingtin — although it 
took them a considerable amount of time to get 
there. “We've focused the last 17 years on RNAi- 
based approaches,’ says Davidson. However, 
both this and a promising related treatment 
for Huntington's disease that involves antisense 
oligonucleotides (S39) will probably require 
long-term, repeated administration to provide 
sustained benefits. 

By contrast, CRISPR could achieve the same 
benefits through a single dose that permanently 
inactivates the defective gene with remarkable 
efficiency, as Davidson's team demonstrated last 
year’, both in cells from people with Hunting- 
ton’s disease and in mouse models of the condi- 
tion. “I was surprised how easy it was — I think 
that’s the beauty of the system,” she says. In the 
past five years, several teams of researchers have 
independently shown that genome editing can 
reliably eliminate the gene that encodes mutant 
huntingtin, thereby halting the production 
of the toxic protein and its accumulation into 
clumps in experimental models. 


But clearing protein clumps in mice is of 
questionable value when researchers often 
struggle to translate such findings into treat- 
ments for people — in general, potential 
therapies for brain disorders have a long 
history of failure and disappointment in clini- 
cal trials. Accordingly, the early adopters of 
CRISPR are trying to obtain clearer evidence 
of its probable clinical benefits while grappling 
with thorny questions related to its safety, effi- 
cacy and delivery that it is crucial to answer 
before trials in people can take place. “I believe 
we can now seriously consider clinical strate- 
gies to edit huntingtin,’ says Nicole Déglon, 
a neurologist at the Lausanne University 
Hospital in Switzerland, “but I would say we 
are still at the very beginning of the story.” 


TO THE LETTER 

The targeted DNA-snipping capabilities of 
CRISPR evolved in bacteria as a defence against 
viruses that shoehorn their genomic material 
into their microbial hosts. The system uses a 
short sequence of RNA known asa guide RNA, 
which can pair with a complementary DNA 
sequence. Researchers have learnt how to target 
almost any genomic sequence by engineering an 
appropriate guide RNA. They couple it with an 
enzyme called Cas9, which can then cut both 
strands of a DNA sequence of interest at a spe- 
cific site. Because the DNA-repair mechanism 
of cells is sloppy, it typically produces insertions 
or deletions that inactivate the affected gene. 

One of the first decisions that would-be 
editors have to make is whether to eliminate the 
gene that encodes huntingtin altogether, or to 
selectively target the repeat-laden mutated copy. 
Although the function of huntingtin remains 
poorly understood (S36), it is crucial for early 
development. “If you knock out huntingtin in 
mice, they die in the womb,’ says Jong-Min 
Lee, a neurogeneticist at Massachusetts Gen- 
eral Hospital in Boston. However, Xiao-Jiang 
Liand colleagues at Emory University School 
of Medicine in Atlanta, Georgia, have obtained 
evidence from mouse studies’ that the depletion 
of huntingtin in the brain might not be detri- 
mental when it occurs in adulthood. His team 
subsequently demonstrated’ that a CRISPR- 
Cas9 approach that eliminates huntingtin can 
clear clumps of the protein from the brain with 
no apparent adverse effects in a mouse model 
of Huntington's disease (see ‘Cutting down 
on huntingtir), although he is cautious about 
drawing too firm a conclusion. “We didn't find 
any obvious phenotype or neuropathology, but 
we still don’t know whether there was some sort 
of functional impact; says Li. 

Most researchers are therefore erring on the 
side of caution by designing guide RNAs that 
recognize sequences found only in the mutated 
gene. This was the approach that David- 
son's team pursued, and Lee and colleagues 
also showed that they could make edits with 
remarkable accuracy in cells that were collected 
from a person with Huntington's disease, by 
designing guide RNAs that recognize sequence 
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variations found only on the chromosome that 
contains the mutated gene’. “The specificity is 
excellent,’ says Lee, noting that the chromo- 
some that bears the normal copy of the gene is 
consistently unaffected in treated cells. Achiev- 
ing this in a clinical setting would require a 
level of personalization, but Lee has collected 
genomic data from more than 4,000 people 
with Huntington's disease and identified some 
informative patterns of sequence variation that 
are strongly associated with mutated copies of 
the gene that encodes huntingtin. “With just 
a couple of CRISPR designs, you could easily 
target more than 50% of patients,” he says. 

Another concern is off-target editing, in 
which genes other than the target are modi- 
fied inadvertently — with potentially disas- 
trous consequences. Software can be used to 
predict probable off-target edits and to help 
researchers pick distinctive guide RNAs with 
reasonable confidence. But clinical researchers 
do need to consider the effects that CRISPR 
might have when used over the longer term. 
“We should not apply this to humans if we have 
permanent expression of Cas9 in the brain,” 
says Déglon. Unfortunately, most systems 
for getting the CRISPR machinery into the 
brain rely on its delivery by viral vector, which 
could lead to Cas9 being produced indefinitely. 
Over time, the enzyme might wreak irrepara- 
ble genomic damage on healthy neurons. A 
possible solution entails using synthetic nano- 
particles to facilitate the one-time delivery of 
the enzyme and guide RNA, although this 
work is still at an early stage. 

Déglon’s team has devised a promising alter- 
native to CRISPR called KamiCas9, which 
includes a self-destruct button for Cas9. It 
uses two guide RNAs — one to target the gene 
encoding huntingtin, and another to target the 
gene encoding Cas9. This means that, after a 
brief flurry of activity by Cas9, production of 
the DNA-dicing enzyme is inactivated perma- 
nently, which dramatically reduces the risk of 
collateral damage. She notes that several weeks 
after conventional CRISPR-Cas9 was applied 
to neural cells derived from people with Hun- 
tington's disease, low levels of off-target editing 
were detected — roughly 2% of modified cells 
received unwanted edits at a site that is par- 
ticularly susceptible to off-target editing’. By 
using KamiCas9, her team was able to reduce 
that effect dramatically — only 0.5% of such 
modified cells had off-target edits. “We did not 
see any difference in terms of efficacy, which is 
really good news,’ says Déglon. 


BURDEN OF PROOF 

Such concerns are of little relevance unless edit- 
ing with CRISPR can be shown to change the 
course of a disease — something that is diffi- 
cult to demonstrate through experiments with 
mice. Li's team has been able to alter Hunting- 
ton’s disease at the molecular level’ by sharply 
reducing the production of mutant huntingtin, 
which forms the toxic clumps that drive the 
progression of the condition. “We have shown 


CUTTING DOWN ON HUNTINGTIN 


The mutant protein huntingtin (green fluores- 
cence) is abundant in brain tissue gathered from 
a mouse model of Huntington’s disease (top), but 
a CRISPR-based intervention that targets the gene 
encoding huntingtin greatly reduces production 
of the toxic protein (bottom). 


that, in an injected area of the mouse brain, 
probably more than 90% of cells do not con- 
tain huntingtin aggregates,” says Li. This effect 
was accompanied by modest yet measurable 
improvements in motor function. However, as 
with many animal models developed for other 
diseases, the mouse models that researchers 
use to investigate Huntington's disease are poor 
surrogates for what happens in people with the 
condition. “Our model has mild motor pheno- 
types that show up later in life.’ says Davidson. 
“Tt doesn't have any overt, robust neurodegen- 
eration like you would 


“Tbelieve see ina human patient.” 

we can now To some extent, this 
seriously issue reflects the lifespan 
consider of mice — one or two 
clinical years is not enough time 
strategies in which to accurately 
to edit map a degenerative 


disease that normally 
unfolds over decades. 
And there are also fundamental differences in 
the function and organization of rodent brains 
compared with those of larger mammals. How- 
ever, Li’s team has developed a promising pig 
model’ of the condition that reflects the neuro- 
degeneration and the motor and behavioral 
defects observed in people more closely than 
any mouse model so far. “Small animals and 
large animals exhibit very different pathologi- 
cal changes and behavioural changes,” says Li. 
These improved models will also help 
researchers to get a handle on how many 
brain cells must undergo gene editing to obtain 
clinical gains — useful information given the 
impracticality and undesirability of bathing 


huntingtin.” 
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the brain in CRISPR-laden viral vectors. The 
striatum, a brain structure that governs both 
movement and cognition, is a prominent cas- 
ualty of Huntington's disease, and work with 
antisense oligonucleotides and RNAi suggests 
that efficient targeting with CRISPR could help 
to prevent the death of neurons in the area. 
Davidson thinks that cutting the production 
of mutant huntingtin in the striatum by half 
might be sufficient to halt disease progression 
or even prevent its onset. 

For those already in the grip of Huntington's 
disease, there are hints that genomic repair 
could provide a partial rebound. “Neurons 
may have a lot of capacity to get rid of mutant 
protein if you break the continuous formation of 
new aggregates, says Li. A preventive approach, 
however, could one day enable individuals to 
avert their genetic destiny, long before the 
onset of disease. Indeed, Huntington's disease is 
among the few disorders that can be confidently 
predicted using a genetic red flag. But even if 
CRISPR-based treatment amasses a strong body 
of preclinical data to supportits use in Hunting- 
ton’s disease, initial clinical testing will almost 
certainly focus on people with symptoms, for 
whom improvements in motor and cogni- 
tive function can be measured in a reasonable 
timeframe. “Then, based on the results of the 
first trials showing the absence of potential side 
effects, they might consider early-stage or even 
presymptomatic patients,’ says Déglon. 

The brain will not be the first clinical 
proving ground for CRISPR. Instead, initial 
forays will probably be aimed at conditions 
such as haemophilia, which can be treated 
with cells that have already been genetically 
manipulated in the laboratory. The brain 
remains a daunting target because of its bio- 
logical complexity, relative inaccessibility and 
irreplaceable function. But the parallel surge 
in the clinical development of gene therapy 
and oligonucleotide-based interventions 
has cleared a path for testing the potential of 
CRISPR in treating Huntington's disease. Even 
at this early stage, Davidson is optimistic. She 
is collaborating with Intellia Therapeutics in 
Cambridge, Massachusetts, which was co- 
founded by CRISPR pioneer Jennifer Doudna, 
to address the technical challenges that are 
involved in moving her research into the clinic. 
“Thate to say this, because I probably gave these 
sorts of numbers for RNAi, but with further 
advances in delivery, I could envision doing 
clinical testing within five years,’ says David- 
son. “I don’t think it’s particularly far off” = 


Michael Eisenstein is a freelance science 
writer in Philadelphia, Pennsylvania. 
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Elli Hofmeister began to show signs of Huntington’s disease at an early age. 


PAEDIATRICS 


Ahead of time 


Huntington’s disease is not just a condition of middle 
age — it can affect children and teenagers, too. 


BY ELIE DOLGIN 


E: Hofmeister started to lag behind in 


school when she was 8 years old. By 

the age of 9, she needed an extra hour 
of tutoring each night to keep up. Elli’s family 
chalked up her problems to a learning disabil- 
ity. But when Elli, at age 13, began to limp and 
slur her speech, “it all just started clicking,” says 
her mother, Camille Tulenchik, a hair stylist 
from Maple Lake, Minnesota. 

When Tulenchik was pregnant with Elli, 
she consulted a genetic counsellor because 
her boyfriend at the time had a family history 
of Huntington's disease. The boyfriend didn't 
know whether he had inherited a mutated copy 
of the gene huntingtin, which is responsible for 
the condition; ifhe had, there would be a 50% 
chance that Elli had done so, too. But if Elli did 
turn out to bea carrier of the gene, the counsel- 
lor explained, she probably would not develop 
symptoms until adulthood. Tulenchik recalls 
thinking, “We've got lots of time.” 

It was only when Elli began to experience 
physical problems in her early teenage years 
that Tulenchik decided to read up on her 
daughter’s genetic risk. “I looked up Hunting- 
ton’s and saw ‘juvenile’ and said, ‘Oh no.” 

When nineteenth-century physician 
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George Huntington described the devastating 
neurological illness that now bears his name, 
he wrote that he knew of no cases in which the 
person affected had shown noticeable signs 
of disease before the age of 30. Yet the earliest 
documented case of juvenile Huntington's dis- 
ease (JHD) pre-dates his seminal 1872 report 
by almost a decade — and neurologists now 
estimate that about 5% of cases of Huntington's 
disease are diagnosed before the person 
affected turns 20 (see ‘At the extremes’). 

The main determinant of the age of onset 
is the number of repeats of a certain triplet of 
DNA bases in the gene huntingtin: a normal 
version of the gene contains 35 or fewer such 
repeats; 36 or more results in the formation of 
an unstable protein that causes Huntington's 
disease. The greater the number of repeats, 
the more unstable the protein is, and the 
more likely a person is to become unwell as 
a youngster. Elli has 65 repeats, well beyond 
the loosely defined threshold of 50 repeats 
at which JHD becomes more common. Her 
father has only 44 repeats, but errors in DNA 
replication meant that Elli inherited an even 
longer mutated region. 

Just because someone has a large number of 
repeats, however, does not mean that they will 
show signs of Huntington's disease during their 


schooldays. “There must be other factors that 
influence the onset age,’ says Martha Nance, 
medical director of the Huntington's Disease 
Clinic at Hennepin County Medical Center in 
Minneapolis, Minnesota. “We just don't know 
what they are.” 

In fact, much of JHD remains shrouded in 
mystery, largely because few researchers have 
studied the disease in young people. Take, for 
example, the Genetic Modifiers of Huntington's 
Disease Consortium, which undertook the 
largest DNA-mapping study of genes associated 
with the progression of Huntington's disease 
(GeM-HD Consortium, Cell 162, 516-526; 
2015). Of the 4,082 participants in the study, 
only 29 had been diagnosed before the age 
of 20, according to neurogeneticist Jong-Min 
Lee, one of the consortium’s leaders at the 
Massachusetts General Hospital in Boston. 

In recent years, researchers’ interest in JHD 
has picked up — and slowly the spotlight is 
shifting to this unique population of patients. 
“For too long, JHD has been under the radar,’ 
says Peg Nopoulos, a psychiatrist and neuro- 
scientist at the University of Iowa in Iowa City. 
“It’s time to pay attention to the kids who are 
suffering from this disease” 


CATCH THE SIGNS 

For Nopoulos, filling in the missing data meant 
starting with a simple catalogue of the many 
ways in which symptoms differ between chil- 
dren and adults with Huntington's disease. 
Among young people with the condition, 
muscle stiffness is perhaps the most common 
complaint. That’s because children typically 
develop rigidity as one of the initial movement- 
related symptoms, and rarely exhibit the jerky, 
involuntary movements known as chorea that 
characterize adult-onset disease. However, 
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when Nopolous and her colleagues surveyed 
caregivers of patients with JHD, they also 
learned of a range of other problems found 
nowhere in the medical literature. 

As Nopoulos and her team reported last year 
(A. D. Moser et al. Neurodegener. Dis. Manag. 
7, 307-315; 2017), more than three-quarters 
of the respondents said that their wards expe- 
rienced tics, 69% said they were in some type 
of pain, and around half said they were deal- 
ing with moderate-to-severe itching. These 
symptoms were recorded rarely in adults, but 
seemed to be widespread in children with JHD. 
“Tt suggests that juvenile-onset Huntington's 
disease is impacting on parts of the brain ina 
different way than in an adult-onset disease,” 
says Nance, who collaborated on the survey. 

To further probe those neurological 
differences, Nopoulos has used magnetic 
resonance imaging to scan the brains of about 
25 children with JHD (including Elli), as well 
as those of hundreds of healthy young people. 
A defining feature of Huntington’s disease is 
that nerve cells of the striatum, a motor-control 
region in the centre of the brain, shrivel and die 
as the disease progresses — and, indeed, in the 
study participants with JHD, “the striatum is 
just toast,’ Nopoulos says. 

However, the scans also revealed that as the 
striatum shrinks in these children, another 
movement-related brain structure — the 
cerebellum — gets larger. This “pathological 
compensation’, as Nopoulos calls it, could 
explain why youngsters with Huntington's 
disease seem to skip the chorea stage of the 
condition and go straight to stiffness. 

By growing too big, the cerebellum doesn't 
just make up for the missing motor functions 
of the striatum; it overshoots the mark and 
puts the brakes on movement entirely. 

Nopoulos presented these findings in 
February at the 13th Annual Huntington's 
Disease Therapeutics Conference — at which 
one of the few other scientists to discuss data on 
JHD was Mahmoud Pouladi, a neurogeneticist 
at the A*STAR Translational Laboratory in 
Genetic Medicine and the National University 
of Singapore. Pouladi’s team coaxed stem-cell 
lines generated from children with Huntington's 
disease to form what amount to 3D miniature 
brains. The disease is usually associated with 
neurodegeneration, but experiments with 
Pouladi’s brain-like structures suggest that it’s 
also linked to neurodevelopment — and the 
greater the number of triplet repeats, the more 
abnormal that development will be. 

Another way to study the molecular basis 
of JHD — and to try to develop treatments to 
reverse the condition — is to use transgenic 
mouse models. Few scientists who genetically 
engineer mice to study Huntington's disease 
set out explicitly to model JHD rather than 
adult-onset disease. But according to Gillian 
Bates, a molecular neuroscientist at University 
College London, that might be what the 
research community has done inadvertently. 
“All of our mouse models are models of the 
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Motor impairment associated with Huntington’s disease is rare in children and adolescents. People 
who carry the gene mutation that causes the condition develop symptoms on average at around 
the age of 40. Huntington’s disease can strike later in life, but this is also rare. 
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juvenile form of the disease,” she says. 

To observe neurodegeneration during the 
short life of amouse — and over a time course 
that’s suitable for experimentation — “we 
often purposefully push the disease’, explains 
Cat Lutz, director of the mouse repository at 
the Jackson Laboratory in Bar Harbor, Maine. 
For Huntington's disease, that means increas- 
ing the number of triplet repeats to a level that 
would cause childhood onset in people. 

This protocol could explain why most 
mouse models show many hallmarks of 
JHD, including rigidity and susceptibility to 
seizures — and might even call into question 
the validity of extrapolating data from mice to 
adult-onset Huntington's disease. It could also 
mean that scientists know more about the basic 
neurology of JHD than they realize. 

Then again, those symptoms could just be a 
reflection of how Huntington's disease manifests 
ina rodent, and might have nothing to do with 
the number of triplet repeats or types of the dis- 
ease in people. The truth, says David Howland, 
director of research on new animal models for 
Huntington's disease at the CHDI Foundation, 
a US non-profit organization, is that “we don’t 
know how good our models really are’. 


MATTER OF SCALE 
More effort is being invested in developing 
tools for the clinical investigation of JHD. A 
working group of the European Huntington's 
Disease Network, led by clinical geneticist 
Oliver Quarrell at Sheffield Children’s Hospital, 
UK, ran a five-year observational study that 
tracked 95 people who had been diagnosed with 
Huntington's disease at or before the age of 25 
using the Unified Huntington's Disease Rating 
Scale, the most widely used and best-validated 
metric of clinical progression (see page S46). 
The results are not yet published, but 
Quarrell says that the evaluation tool was 
unsuitable for measuring motor functions 
in these young patients because it puts great 
emphasis on chorea and much less on symp- 
toms related to rigidity. He and his colleagues 
are now working on a modified scale to better 
match the distinct features of JHD. 


That tool will be important in light of a ruling 
by the European Medicines Agency stating 
that, from July 2018, companies that develop 
drugs for Huntington's disease will have to 
test such treatments in paediatric populations 
before the products can receive marketing 
approval. At present, all drugs used to manage 
the symptoms of JHD — including dopamine 
modulators, anti-seizure medications, anti- 
anxiety agents and muscle relaxants — are 
taken off-label. Elli, for instance, uses a drug 
that is commonly prescribed for Parkinson's 
disease to ease her stiffness, over-the-counter 
pain medicines to deal with aches, and physical 
therapy to stay as supple as possible. 

Her mother follows websites such as 
HDBuzz to keep on top of the latest drug 
trials. She then discusses options with Nance, 
Ellis neurologist, but is yet to find anything 
promising that also accepts younger par- 
ticipants. To enrol in a study for one of the 
treatments that aims to silence the mutated 
gene huntingtin (S39), for example, volunteers 
need to be at least 25 years old. “Right now I 
feel like we are very limited in our options,” 
says Tulenchik. 

Elli turned 20 in February. Three days a 
week, she attends a transition programme for 
young adults with special needs, where she 
helps to run the coffee shop. She also volun- 
teers at a nearby nursing home, decorating 
the bulletin board and cleaning bingo cards, 
swimming-pool noodles and musical instru- 
ments. For her most recent birthday, Elli 
celebrated by hosting a sleepover for only her 
closest female relatives, including her sister 
Violet, which meant that her brother Zander 
couldn't attend. “No boys allowed!” says Elli, in 
a slow and indistinct manner. 

They decorated masks, ate cake and ice 
cream, and stayed up after midnight, watching 
Fly Away Home, a feel-good, 1990s-era family 
drama about a teenager who teaches her pet 
geese to fly. “Our motto is: “Today is our best 
day’,” Tulenchik says. “We just focus on today.” m 


Elie Dolgin is a science writer in Somerville, 
Massachusetts. 
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The endpoint is near 


Better measures of efficacy are needed in trials of treatments for Huntington’s disease. 


BY KAT ARNEY 


disease are starting to pass through 

clinical trials, and excitement is build- 
ing among researchers. “I've been working in 
Huntington’s disease for more than 20 years, 
and we're in a new era,’ says Sarah Tabrizi, 
a clinical neurologist at University College 
London. “We've previously only had trials look- 
ing at drugs that relieve symptoms, but now we 
know the root cause of the disease, and we're 
starting to see molecular therapies that target 
it? she says. 

Tabrizi led one such study, which tested 
whether short pieces of modified DNA known 
as antisense oligonucleotides (ASOs) could 
switch off the protein-coding messages tran- 
scribed from the gene that is mutated in people 
with Huntington’ disease. The trial received 
widespread media coverage in December 
2017, and researchers, clinicians and patients 
hope that this gene-silencing approach could 
provide the first treatment to truly modify the 
disease (see page S39). 

But beneath the positivity lurks a thorny issue 
for researchers, such as Tabrizi, who are develop- 
ing treatments. “We absolutely have to be sure 
they are working; she says. Unfortunately, this is 
not so easy to determine in Huntington's disease. 

Unlike trials of cancer drugs, in which 


Pisses: treatments for Huntington’s 
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efficacy can be quantified with relative 
ease — tumours can be seen to shrink, grow 
or stay the same size — designing trials to 
demonstrate meaningful improvements in 
progressive neurological conditions such as 
Huntington’s disease is not straightforward. 

The assessment tool used in almost all trials 
so far is the Unified Huntington's Disease Rating 
Scale (UHDRS). Developed in 1996 by the 
Huntington Study Group, an international col- 
laboration, the UHDRS enables doctors to score 
a person's overall physical and neurological 
fitness. By testing participants at regular inter- 
vals, investigators can work out whether a 
potential treatment is slowing the progression 
of the disease, compared with a placebo. 

“We've used the UHDRS for many years,” 
says Blair Leavitt, consulting neurologist at the 
University of British Columbia in Vancouver, 
Canada. “You can see reliable progression over 
a certain period, but we need better ways to 
measure it.” 

Symptoms such as movement difficulties 
or cognitive impairment can vary in severity 
from day to day. This makes it difficult to tell 
whether a treatment is having an impact on the 
disease or the patient is just having a good day. 
And, as Leavitt explains, the subjective nature 
of functional assessments such as the UHDRS 
means that they’re greatly susceptible to the 
power of the placebo effect. 


The reliability of such tools was thrown 
into the spotlight with the announcement of 
results from Pride-HD, a trial of pridopidine 
(Huntexil) run by Teva Pharmaceutical 
Industries in Petach Tikva, Israel. Although 
previous trials of drugs aimed at relieving 
symptoms were inconclusive, preclinical test- 
ing suggested that pridopidine might help to 
protect neurons from the damaging effects of 
mutant huntingtin, the toxic protein produced 
by people with Huntington's disease. 

Yet Pride-HD failed to show that pridopi- 
dine led to improvements in motor function, 
which had been declared as the study’s primary 
endpoint (a predetermined milestone that sig- 
nals the success of a treatment). However, the 
researchers did notice some improvement in 
one of the six components of the UHDRS, a 
measure known as total functional capacity. As 
one of the most subjective parts of the scale, 
it considers whether a person is able to work, 
handle finances or perform self-care tasks. 


PLACEBO EFFECT 
The results generated hope that pridopidine 
might modify the progression of Huntington's 
disease, as opposed to its symptoms. But, as 
Leavitt points out, it’s more likely that there is 
an alternative explanation for its effect. 
“What's pretty clear to me is that there’s a 
big placebo effect seen with investigator-rated 
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scales like UHDRS that isn’t there in more 
quantitative measures,” he says. “This is why 
you must define your primary endpoint before 
the trial — you can't keep going back and look- 
ing until you find something that works.” 

In the absence of better options, and despite 
its limitations, the UHDRS has been the pri- 
mary endpoint of choice for two decades. 
“Objective, quantitative measures will give us 
more sensitivity,’ says Leavitt. In 2012, Tabrizi, 
Leavitt and their collaborators published the 
results’ of TRACK-HD — a major study that 
used a battery of approaches, including brain 
imaging and cognitive, motor and psychiatric 
assessments, to monitor more than 100 people 
with early-stage Huntington's disease over a 
period of two years. The study also followed car- 
riers of the mutated gene who were yet to show 
signs of the disease, as well as people without 
the mutation. 

They found a number of measurable features 
that reflected disease progression, including 
brain volume on magnetic resonance imaging 
scans and specific motor and cognitive charac- 
teristics. The TRACK-HD team have pooled 
their data with results from other cohort stud- 
ies to generate a composite endpoint for future 
trials. It comprises a suite of cognitive, motor 
and physiological traits that can accurately 
assess the progression of Huntington's disease. 

Known as the composite UHDRS, it’s built 
on the bones of the original scale and retains the 
most reliable tests. It also includes further cogni- 
tive measures and ditches traits that don’t show 
much progression with time, such as emotional 
recognition and tongue-muscle strength. 

The combined study, which included more 
than 1,600 people with early-stage Hunting- 
ton’s disease, defined important parameters 
such as the number of participants that is 
needed to ensure statistical rigour, as well as 
the optimal duration 


of a trial. “Huntington's “We have 
is a slowly progressive @xciting new 
disease, but we showed measures and 
that we could measure therapies, but 
progression in almost we don’t have 
everybody over just agood way of 
two years,” says Tabrizi. comparing 
Theteamalsoreached them.” 
agreement on the degree 


to which progression should be slowed for 
trials of disease-modifying treatments to be 
deemed a success: people who receive the treat- 
ment should show a decline on the composite 
UHDERS that is 20-30% slower than that of 
those who receive a placebo. 

It seems obvious that trials should be 
designed to paint the most accurate picture of 
the benefits and risks of treatments. But Tabrizi 
suggests that the lack of effective disease- 
modifying treatments for Huntington’s disease, 
together with the fact that drugs such as ASOs 
must be administered directly into cerebro- 
spinal fluid (CSF) by lumbar puncture — an 
uncomfortable procedure in which a needle 
is inserted into the spinal canal — means that 


researchers have an ethical duty to make sure 
that trials are designed as well as possible to 
reveal whether treatments are working. 


MARKERS OF SUCCESS 

Although Leavitt and Tabrizi agree that the 
composite UHDRS is an improvement on the 
conventional scale, the hunt is on for biological 
markers (biomarkers) that change as Hunting- 
tons disease progresses. The most obvious can- 
didate is mutant huntingtin, which was thought 
to leach into CSF from damaged brain cells, in 
a similar way to CSF biomarkers now used for 
other neurodegenerative conditions. However, 
developing a reliable test for quantifying the 
protein has proved challenging, in part because 
it is present in CSF at very low concentrations. 

As a solution, Leavitt and his colleagues are 
developing ultrasensitive assays that can detect 
changes in levels of huntingtin in CSF with 
disease progression. One approach’, based 
on immunoprecipitation and flow cytometry, 
revealed a decrease in the mutant protein fol- 
lowing treatment with ASOs in a mouse model 
of Huntington's disease, and could serve as a 
primary endpoint for trials of disease-modify- 
ing treatments. An alternative approach’ that 
counts single molecules of mutant huntingtin 
was used in a 2015 trial of ASOs (S39). In any 
case, monitoring protein levels in CSF would 
require repeated invasive lumbar punctures. 

Other teams are using blood as a more 
easy-to-access source of potential biomarkers. 
One such candidate molecule is neurofilament 
light polypeptide (NF-L) — a component of 
neurons that is released into CSF as the cells 
die, eventually making its way into blood. Lev- 
els of NF-L in blood plasma mirror those of 
mutant huntingtin in CSE, and a retrospective 
study* of more than 200 people with Hunting- 
ton’s disease or who carry the mutation that 
leads to the condition showed that NF-L levels 
could be used to predict the onset of symp- 
toms, as well as to track disease progression. 

Despite promising results, biomarkers in 
blood or CSF are only surrogates for the under- 
lying disease that ravages the brain. The most 
direct assay would involve imaging mutant 
huntingtin in the brain to determine whether 
it diminishes after treatment, although this is 
technically challenging. Researchers funded by 
the US non-profit CHDI Foundation are devel- 
oping radioactive ‘flags; or ligands, that bind to 
clumps of mutant huntingtin in the brain and 
can be detected by positron emission tomogra- 
phy. Trials in people are expected to start later 
this year, according to Cristina Sampaio, chief 
medical officer at CHDI. 

Leavitt and his team are also interested in 
using wearable sensors to monitor certain bio- 
markers such as changes in gait or cognitive 
function in real time. Investigators who use the 
UHDRS or similar scales can assess the abilities 
of patients only on the days on which they visit 
the clinic. However, sensors such as accelerom- 
eters can take measurements continually over 
periods of days, weeks or months, and are even 
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able to monitor sleep patterns and activity levels. 

This data stream can be relayed from 
people's homes to the clinic, creating a more- 
detailed profile of symptom progression. 
Initial studies’ used lab-improvised systems of 
sensors that are strapped to the chest, wrist and 
ankles. But off-the-shelf technologies such as 
fitness trackers or smart watches, in conjunc- 
tion with smartphones, are likely to become a 
more practical option. “We can design simple 
tests on a smartphone to measure gait, walking 
speed or cognitive function, and we can collect 
daily data on mood or any other problems they 
might be having,’ Leavitt says. 

Real-time, remote monitoring of such 
biomarkers through smartphone apps could 
reduce the burden of taking part in trials 
for participants and carers. Travelling long 
distances to a hospital or trial centre can be 
arduous, especially for people with advanced 
Huntington's disease. And the continuous 
collection of data would make it easier for 
researchers to follow overall disease-progres- 
sion trends and to build a more accurate idea 
of each person's response to treatment. 

Despite progress being made, those who are 
developing new primary endpoints find them- 
selves in a chicken-and-egg situation. To show 
that they work, trials of disease-modifying 
treatments need more-appropriate endpoints 
than those provided by the UHDRS. But the 
improved endpoints can be validated only 
against effective drugs, to demonstrate that they 
accurately measure disease progression and 
patients’ responses to treatment. The current 
generation of trials is beginning to incorporate 
measures such as biomarkers and brain imaging 
as exploratory secondary endpoints, alongside 
the UHDRS. Despite its flaws, the UHDRS is 
still the only tried-and-true measure of disease 
progression available to researchers. 

“Tt’s a circular problem. We have exciting new 
measures and therapies, but we don’t have a 
good way of comparing them to prove that they 
work,’ says Leavitt. “Our main clinical endpoint 
is still the old UHDRS, which isn’t that great. 
Were at the point now where we need an effec- 
tive therapy to show how things respond.” 

As Huntington's disease enters an era of tar- 
geted molecular treatments, Tabrizi thinks that 
researchers owe it to those affected to design the 
best possible trials in which to test such drugs. 
“We've spent years studying the natural history 
of the disease to develop our armamentarium 
for these trials, and we're just waiting for really 
good drugs,’ she says. “Huntingtons is a terrible 
disease with a huge unmet need, and patients 
and their families desperately want treatments 
that work. We cannot afford to mess this up.” m 


Kat Arney is a science writer and broadcaster 
based near London. 
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4 BIG QUESTIONS 


How does the mutant 
protein huntingtin 
cause Huntington’s 
disease? 


What is the role of 
normal huntingtin? 


Huntington’s 
disease? 


Will ASOs be the first 
effective treatment 
for Huntington’s 
disease? 


Huntington’s disease is caused 
by a mutation in a single gene 
called huntingtin (HTT), which 


*. encodes the protein huntingtin. 


Understanding how the mutant 
protein causes disease could 
open up avenues for treating the 
condition and its symptoms. 


To help develop treatments for 
Huntington’s disease, including 
those that use RNA interference, 


antisensense oligonucleotides 
“ (ASOs) or gene editing, it’s 


important to determine whether 
normal huntingtin, as well as the 
mutant version, can be reduced 
or eliminated safely. 


To more effectively assess 
treatments in trials, doctors need 
improved ways of measuring 
whether they slow disease 


: progression. The current best 


tool is the clinician-rated Unified 
Huntington’s Disease Rating 
Scale, which is reliable but prone 


to the power of the placebo effect. 


ASOs are the first potential 
treatment to have successfully 
lowered levels of mutant 


: huntingtin in trials conducted 


in people. But it’s uncertain 
whether these molecules can 
slow or halt progression of 
Huntington’s disease. 


Although potential 
treatments are now 
entering the pipeline, 
the molecular cause 
and progression of 
Huntington’s disease 


continue to elude 
researchers. 


BY ANNA NOWOGRODZKI 


Mutant huntingtin forms clumps 
inside the cell that seem to 
interfere with communication 


-. along the axons of neurons. 


Such aggregates can also throw 
a wrench into the transcription 
of other genes and hinder cells’ 
waste-removal systems. 


Blocking Htt expression in mouse 
embryos is lethal. In adult mice, 


some studies show that removing : 


normal huntingtin has only 


; limited effects, whereas others 


indicate it shortens lifespan and 
causes nerve and behavioural 
problems. The effect of reducing 
huntingtin in people is unknown. 


A 2017 study showed that 
changes in levels of neurofilament 
light polypeptide (NF-L) in 

blood correlate with the onset 


* of Huntington’s disease, making 


ita possible biomarker. Other 


biomarkers that correlate with the 


condition can be measured by 
functional brain imaging. 


In a phase I/lla trial, an ASO called; 


IONIS-HTTp, reduced the levels of 


_ mutant huntingtin in participants’ 
», cerebrospinal fluid. But the trial 


was too short to determine the 


treatment’s long-term effects. The 


drug is delivered once a month 
via an injection into the spine. 


Various projects led by 
researchers, companies and 
non-profit organizations are 


5 using computational methods 
‘to better understand the shape 


of mutant huntingtin, how it 
aggregates, and how it interacts 
with other proteins in the cell. 


Researchers are eliminating 
normal huntingtin in mammals 
with lifespans longer than those 
of mice to determine any long- 


: term effects. Efforts are also 


underway to inactivate just the 
mutated copy of HTT, leaving the 
normal version intact, using the 
gene-editing tool CRISPR-Cas9. 


Three large long-term 
observational studies have 
been designed to assess the 
ability of potential biomarkers 


to measure disease progression. 


The team investigating NF-L 

has launched a 600-participant 
study, and is already monitoring 
NF-L levels in at least 80 people. 


Further trials of IONIS-HTT,, with 
larger numbers of participants 
are needed to determine 


: whether the drug is effective at 


treating Huntington’s disease. 
Researchers are also monitoring 
the 46 participants of the initial 
trial for any long-term effects. 


How can we better : : i 
characterize the : : 
progression of 


Anna Nowogrodzki is a freelance science writer based near Boston, Massachusetts. 
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XINHUA/REX/SHUTTERSTOCK 


ZHEJIANG 


Visitors watch a robot during the 2015 China Yiwu International Manufacturing Equipment Expo & China Intelligent Expo in Yiwu, Zhejiang province. 


Zhejiang province is 
open for science business 


In the global race for high-tech excellence, a historic region of China is 
taking a prominent role on the international stage. 


very weekend, busloads of tourists 
Hanes at the newly opened Interna- 

tional Campus Zhejiang University in 
Haining City. It is a 20-minute high-speed 
train ride away from the province's historic 
capital city of Hangzhou, home to the cam- 
pus'’s prestigious parent institution, Zhejiang 
University. The 80-hectare plot, with its mod- 
ern architecture, water features and pleasant 
walkways, is a dream for day trippers, who 
come to picnic by the artificial lake. 

The arrival of the tourists initially came as a 
surprise to Philip Krein, full-time dean of the 
Zhejiang University/University of Illinois at 
Urbana-Champaign Institute in Haining. Krein 
moved there from the United States in 2016 and 


BY SARAH 0’MEARA 


was not used to seeing universities as tourist 
destinations. But he’s getting used to the flag- 
bearing tour guides ushering visitors around 
the facilities, which opened in 2017. His is one 
of a number of overseas universities that have 
daughter institutions or collaborative labs in 
Zheijiang, including Imperial College London 
and the University of Edinburgh, UK. 
“Academic institutions are extremely impor- 
tant and valued in China,” he says. “People 
want to see how the country is developing” 
Although the region's historic strengths lie 
in its prominence as a shipping hub and as 
an access point to central China, Zhejiang’s 
government leaders think that future eco- 
nomic growth will come from investment in 


its digital economy. At every level of public 
life, from university programmes to city man- 
agement, officials are working with scientists 
and engineers to put cutting-edge science 
— such as artificial intelligence, big data and 
cloud computing — at the heart of the region's 
development, and to further internationalize 
the area. Zhejiang’s global significance was 
affirmed in 2016, when Hangzhou hosted 
the first Group of 20 (G20) meeting of world 
leaders ever to be in held in China. 

The economic change taking place in 
Zhejiang reflects the country’s wider ambi- 
tions (see ‘On the map’). China's economy is in 
a period of rapid transition. The government's 
goal, originally set out ina 15-year science > 
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> and technology plan in 2006, is to transform 
the country from a low-cost manufacturer 
into a technologically advanced, innovative 
economy in which products are no longer just 
made, but “created”, in China. 

By 2049 — the centenary of the founding of 
the People’s Republic of China — the country 
plans to be a world-leading science and technol- 
ogy power. It also wants to share its expertise 
with trading countries across Asia, Europe, the 
Middle East and East Africa through higher- 
education initiatives, as part of its ‘One Belt, 
One Road’ foreign-policy programme, initi- 
ated in 2013. That project aims to strengthen 
economic and diplomatic relationships between 
China and regional trading partners through 
the construction of vast infrastructure projects, 
including roads, railways and docks. 


BIG PICTURE 

Krein says that the clear focus of Zhejiang’s 
government and its international outlook ena- 
bles faculty members at his university to think 
big. On campus, the three foreign universities 
are in the early stages of planning collaborative 
projects, such as one at the Biomedical Trans- 
lational Research Centre, a facility announced 
in 2018 to focus on turning academic research 
into technologies that can improve health care. 

Krein’s team is also developing a large-scale 
‘intelligent’ infrastructure project that uses 
millions of sensors to evaluate the distribution 
of bridges, roads and railways, alongside sen- 
sors in the water systems, to build up a compre- 
hensive picture of the flow of traffic, waste and 
people across urban areas. Both projects are in 
their early stages, but Krein expects funding to 
come from a consortium of industry and local 
government. 

“We're not a satellite organization. This is 
a genuine collaboration,” explains Krein. He 
is trying to develop unusual, cross-discipline 
projects and foster teaching styles that encour- 
age greater creativity and curiosity among 
Chinese students — an endeavour spurred by 
a wider concern that China's teaching culture, 
from an early age, prioritizes rote learning over 
critical thinking. 

A decade ago, China’s government 
launched an initiative to reverse its academic 
brain drain, after research revealed that of 
1.1 million Chinese people who had left to 
study overseas since 1978, only 275,000 had 
returned, as of June 2007 (see go.nature. 
com/bdrain). High salaries were offered by 
universities to Chinese-born, Western-trained 
professors willing to return to China as part of 
a programme called 1000 Talents. 

Despite such efforts, it’s still common for 
ambitious young scientists to head abroad 
after finishing their degree in China, because 
experience in foreign labs remains highly 
valued. This situation can make it difficult 
for Chinese universities to recruit high-level 
talent at the postdoc level, says Jiaming Hu, a 
neuroscientist at Zhejiang University. Hu says 
that universities often require newly hired 
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Zhejiang province is less well known than 
other Chinese regions, but its health is vital 
to the country’s rapidly internationalizing 
science economy. 


professors and associate professors to have at 
least two years of research experience abroad. 

Yet early-career Chinese scientists who 
choose to stay in Zhejiang can be in a strong 
position to find rewarding jobs, both in state- 
funded and commercial labs, given that the 
province is betting much of its future on science 
and technology development. Hu’s lab is just 
3 years old, and the product ofa US$25-million, 
5-year investment into the institute. That money 
also bought a large primate facility, which Hu 
says is unlike anything hed find in the West. 
“Some of my friends went to foreign labs and 
sometimes find it not that easy to get all the 
animal resources that they need,” Hu says. 

Researchers are also finding that, even if 
they stay in China, opportunities to collaborate 
with foreign scientists are growing. Materials 
scientist Yong Wang, who returned to China in 
2012 as part of the 1000 Talents programme, 
says that Zhejiang University offers multiple 
programmes to enable staff and students to go 
abroad for a short stint. 

Guohua Xu, an evolutionary geneticist 
who works alongside Jiaming Hu at Zhejiang 
University, adds that leading scholars fre- 
quently visit the campus and give talks. Both 
Hu and Xu highlight the generous funding 
researchers in China enjoy. “The advantage is 
that we often have equipment many Western 
universities do not have,’ Xu says. “The disad- 
vantage is that we are not yet at a high enough 
level to compete with the best international 
teams and attract the best talent.” 

But that generosity isn’t always balanced 
across research fields. A reform of the funding 
system, announced in March, seems to shift 
priorities away from basic-science research 
and towards government-sanctioned projects, 
at a time when many prominent Chinese 
researchers say that China's spending is already 
too low compared with other nations. 

Wang remains optimistic. “I think it’s 
possible to do both. To work within interna- 
tional teams and pursue universal truths, and 
also achieve China-specific goals,” he says. 

For entrepreneurial scientists in Zhejiang, 
many routes are available for turning research 


into commercial products. Zhejiang University 
has invested in its own industrial park, 
1 of 15 similar science parks across China that 
are designed to act as incubators for commercial 
enterprises and start-ups. Funding comes from 
both the private and public sectors. Recent com- 
mercial successes include Drore Technology, 
which offers tourism-focused smart technology 
such as interactive maps and intelligent audio 
guides, and NationalChip, which manufactures 
computer chips for television systems. 

Last year, the province's capital opened a 
113-square-kilometre zone for the development 
of science and technology, called Hangzhou 
Future Sci-TechCity. Within the site lies AI 
Town. It opened last year, and plans to have 
20,000 researchers and 200 innovation teams, 
led by top scientists and industry pioneers, on 
site by 2022. The vast collaboration will bring 
together innovative companies, such as Alibaba 
and the Chinese Internet giant Baidu, with a 
variety of prominent university teams. 

The impetus to turn research into viable 
products is embedded into university cul- 
ture, says Anna Wang Roe, a neuroscientist 
at Zhejiang University. Staff and postdocs are 
encouraged to apply for opportunities to set 
up new ventures. All of this comes with a price 
tag: each year, the province invests around 
$20 million to $30 million in Zhejiang Uni- 
versity to encourage the development of com- 
panies. In early 2018, the province reported 
that, between 2013 and 2017, the number of 
high-tech enterprises had more than doubled, 
to 11,462, whereas small and medium-sized 
science and technology enterprises had 
increased eightfold, reaching 40,440. 


PIONEERING ROOTS 

“The local government here has always looked 
ahead,’ says May Tan-Mullins, who studies 
international relations at the University of 
Nottingham Ningbo China. “Fifteen years 
ago, as the city was growing, they planted trees, 
when no other Chinese cities did. Locals com- 
plained they were wasting money that should 
be spent on houses. Now people appreciate the 
city’s green belt” 

Tan-Mullins’s institution — a joint venture 
between the University of Nottingham, UK, 
and the Zhejiang Wanli Education Group — is 
a reflection of an internationalizing China. 
“We have staff from over 50 countries and 
students from more than 70,” says Tan-Mullins. 

Although the university’s major market is 
still domestic, about 10% of students come 
from overseas — the highest proportion of 
any Chinese university. And 90% of faculty 
members are international. “We joke that on 
our 144 acres of land, we probably have the 
highest concentration of foreigners in China,” 
says Tan-Mullins. 

“Tt's brought Ningbo on to the global stage,” 
she says. “And also onto the China stage.” = 


Sarah O'Meara is a writer in Shanghai. 
Additional research by Liu Shaoxin. 
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Q&A Hailan Hu 
The zest of Zhejiang 


Neuroscientist Hailan Hu moved back to her home town to capitalize on its vibrant science 
environment. She researches depression at Zhejiang University School of Medicine in Hangzhou. 


Why did you return to Hangzhou? 

The medical school has a supportive and 
nurturing environment where I could grow. 
Being close to family and friends is a plus. 


What were you doing before that? 

After graduating from Peking University 
in Beijing , I did a PhD at the University of 
California, Berkeley, followed by a postdoc 
at Cold Spring Harbor Laboratory in 
New York. I joined the Chinese Academy 
of Sciences Institute of Neuroscience in 
Shanghai in 2008. 


What do you like about Zhejiang University? 
We foster a strong culture of interdisciplinary 
research. My own lab collaborates with 
engineering, computer science, pharma- 
cology and chemistry teams. Zhejiang 
University has several affiliated hospitals 
that provide a good platform for translational 
medical research, and it has a reputation 
for a strong entrepreneurial spirit. 


Tell us about your team and current research. 
It's an entirely Chinese team right now, but 
I’m considering taking on foreign students 
as we grow. In 2016, we discovered how the 
anaesthetic ketamine blocks electrical bursts 
from a region of the brain and relieves the 
symptoms of severe depression. We're talking 
to scientists and clinicians worldwide about 
translating the research into antidepressants. 


Are you planning new collaborations abroad? 
Our neuroscience centre is going to establish 
a programme with the University of Toronto 
in Canada, for students and postdocs to work 
overseas. We have formal collaborations with 
the University of California, Los Angeles, 
Columbia University in New York City and 
the University of Melbourne in Australia. 
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How has Hangzhou changed since you left? 

Hangzhou was China's top honeymoon 
destination. Now, it is still beautiful, but its 
tourism-based economy has transformed. 
The atmosphere is modern and interna- 
tional. It feels different. I almost never take 
my wallet out; I pay using my mobile phone. 


Are you surprised by the transformation? 
Change was always a part of life here. There’s 
a saying: the three big eastern economies, 
Beijing, Shanghai and Zhejiang, have dif- 
ferent business models. Beijing has the 
government-owned businesses, Shanghai 
favours foreign companies and big brands 
and Zhejiang cultivates entrepreneurship. 


What practical benefits do you find China has? 
I perhaps spend less time writing grants 
compared with my peers in the United States. 


What advances do you hope to see in the 
Chinese research environment in the future? 
As basic research booms and more graduate 
students and postdocs are doing outstanding 
work in China, I hope we can provide equal 
support to those trained at home and abroad. 
At present, some career-development grants 
are designed for Chinese scientists who have 
studied and worked abroad. Now is the time 
to extend these schemes to all qualified 
young trainees. 


What attracts young scientists to Zhejiang? 

With its nice environment and low cost of 
living, I think Zhejiang has become as attrac- 
tive to researchers as Beijing or Shanghai, if 
not more so. The opportunities are growing. = 


INTERVIEW BY SARAH O’MEARA. 
ADDITIONAL RESEARCH BY LIU SHAOXIN 


This interview has been edited for length and clarity. 
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Computer-calculated 
compounds 


Researchers are deploying artificial intelligence to discover drugs. 


nenormous figure looms over scientists 
Avene for new drugs: the estimated 
US$2.6-billion price tag of developing 
a treatment. A lot of that effectively goes down 
the drain, because it includes money spent on 
the nine out of ten candidate therapies that fail 
somewhere between phase I trials and regula- 
tory approval. Few people in the field doubt the 
need to do things differently. 
Leading biopharmaceutical companies 
believe a solution is at hand. Pfizer is using IBM 
Watson, a system that uses machine learning, to 


BY NIC FLEMING 


power its search for immuno-oncology drugs. 
Sanofi has signed a deal to use UK start-up 
Exscientia’s artificial-intelligence (AI) plat- 
form to hunt for metabolic-disease therapies, 
and Roche subsidiary Genentech is using an 
Al system from GNS Healthcare in Cambridge, 
Massachusetts, to help drive the multinational 
company’s search for cancer treatments. 
Most sizeable biopharma players have similar 
collaborations or internal programmes. 

If the proponents of these techniques are 
right, AI and machine learning will usher in 


an era of quicker, cheaper and more-effective 
drug discovery. Some are sceptical, but most 
experts do expect these tools to become 
increasingly important. This shift presents 
both challenges and opportunities for scien- 
tists, especially when the techniques are com- 
bined with automation (see ‘Here come the 
robots’). Early-career researchers, in particular, 
need to get to grips with what AI can do and 
how best to acquire the skills they need to be 
employable in the job market of tomorrow. 
The AI pioneers of the 1950s discussed > 
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> building machines that could sense, reason 
and think like people — a concept known 
as ‘general AI’ that is likely to remain in the 
realms of science fiction for some time. How- 
ever, the continued rapid growth in computer- 
processing power over the past two decades, 
the availability of large data sets and the devel- 
opment of advanced algorithms have driven 
major improvements in machine learning. 
This has helped to bring about ‘narrow ATI, 
which focuses on specific tasks. These include 
improved abilities to analyse, understand and 
generate text and speech through an AI tech- 
nique called natural-language processing, and 
artificial neural networks designed to mimic 
the way our brains make sense of the world. 
Such techniques are already in widespread use 
in fields such as computer vision, voice analy- 
sis and route selection. This progress has also 
triggered a wave of start-ups that employ AI 
for drug discovery, with many of them using 
it to identify patterns hidden in large volumes 
of data. 

For example, researchers at biotechnology 
company Berg, near Boston, Massachusetts, 
have developed a model to identify previously 
unknown cancer mechanisms using tests on 
more than 1,000 cancerous and healthy human 
cell samples. They modelled diseased human 
cells by varying the levels of sugar and oxygen 
the cells were exposed to, and then tracked their 
lipid, metabolite, enzyme and protein profiles. 
The group uses its AI platform to generate and 
analyse immense amounts of biological and 
outcomes data from patients to highlight key 
differences between diseased and healthy cells. 

The aim of Berg’s approach is to identify 
potential treatments on the basis of the precise 
biological causes of disease. “We are turning 
the drug-discovery paradigm upside down by 
using patient-driven biology and data to derive 
more-predictive hypotheses, rather than the tra- 
ditional trial-and-error approach,’ says Niven 
Narain, Berg’s co-founder and chief executive. 

Using this approach, Narain’s team identified 
the importance of certain naturally occurring 
molecules in cancer metabolism. This led 
the group to discover how a new cancer drug 
works, and indicated some possible therapeu- 
tic uses. The drug, BPM31510, is currently in 
a phase I] clinical trial involving people with 
advanced pancreatic cancer. The company 
is also using this AI system to look for drug 
targets and therapies for other conditions, 
including diabetes and Parkinson's disease. 

London-based start-up firm BenevolentBio 
has its own AI platform, into which it feeds 
data from sources such as research papers, 
patents, clinical trials and patient records. 
This forms a representation, based in the 
cloud, of more than one billion known and 
inferred relationships between biological 
entities such as genes, symptoms, diseases, 
proteins, tissues, species and candidate drugs. 
This can be queried rather like a search engine, 
to produce ‘knowledge graphs’ of, for exam- 
ple, a medical condition and the genes that 
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are associated with it, or the compounds that 
have been shown to affect it. Most of the data 
that the platform crunches are not annotated, 
so it uses natural-language processing to rec- 
ognize entities and understand their links to 
other things. “AI can put all this data in context 
and surface the most salient information for 
drug-discovery scientists,’ says Jackie Hunter, 
chief executive of BenevolentBio. 


“ALIS GOING TO 
LEAD TO THE FULL 


UNDERSTANDING 


OF HUMAN BIOLOGY 
AND GIVE US 
THE MEANS TO 


FULLY ADDRESS 


HUMAN DISEASE.” 


When the company asked this system to 
suggest new ways to treat amyotrophic lateral 
sclerosis (ALS), also known as motor neuron 
disease (MND), it flagged around 100 existing 
compounds as having potential. From these, sci- 
entists at BenevolentBio selected five to undergo 
tests using patient-derived cells at the Sheffield 
Institute of Translational Neuroscience, UK. 
The research, presented at the International 
Symposium on ALS/MND in Boston, Mas- 
sachusetts, in December 2017, found that four 
of these compounds had promise, and one was 
shown to delay neurological symptoms in mice. 


PATTERN RECOGNITION 

Despite these promising applications, many 
scientists are unaware of the capabilities of AL. 
A survey published in February by BenchSci, 
a start-up in Toronto, Canada, that provides a 
machine-learning tool for scientists searching 
for antibodies, found that 41% of the 330 drug- 
discovery researchers who took part were 
unfamiliar with the uses of AI (see go.nature. 
com/2xarpt3). 

Leaders in the field think that researchers 
should brush up on this knowledge as soon as 
possible. 

“Al is going to lead to the full understand- 
ing of human biology and give us the means 
to fully address human disease,” says Thomas 
Chittenden, who leads a team at Wuxi Next- 
CODE in Cambridge, Massachusetts. Wuxi 
NextCODE was formed in 2015 after drug- 
discovery firm WuXi AppTec in Shanghai, 
China, acquired NextCODE Health, a spin-off 
from Icelandic company deCODE Genetics. 
“The way we develop drugs and assess them 
in clinical trials will all come down to very 
sophisticated pattern recognition,” he says. 

In May 2017, a group including researchers 


at Yale University in New Haven, Connecticut, 
demonstrated the role of a family of proteins 
called fibroblast growth factors (FGFs) in 
blood-vessel development (P. Yu et al. Nature 
545, 224-228; 2017). This process is key to both 
tumour growth and cardiovascular disease. 
Wuxi NextCODE uses Alas part of its approach 
of classifying genes according to their roles 
and other attributes, to look for connections 
between RNA-sequence variations, expression 
levels, molecular function and gene location. 
Using this approach, Chittenden’s team discov- 
ered that FGFs exert their influence through the 
control of glucose metabolism. 

Some think the potential of AI to pin- 
point previously unknown causes of disease 
will accelerate the trend towards treatments 
designed for patients with specific biologi- 
cal profiles. “Personalized medicine has been 
talked about for a long time,” says Hunter. “AI 
is going to enable it” 

Sceptics point out that some of these more 
enthusiastic claims echo the excitement over 
computer-aided drug design, which began in 
the early 1980s. Although such in silico model- 
ling techniques are important in modern drug 
research and development (R&D), they have 
not halted a decline in pharmaceutical-industry 
R&D productivity dating back to the mid-1990s. 


MOVING GOALPOSTS 

Whatever happens, industry leaders agree that 
drug-discovery jobs and the skills needed to do 
them are unlikely to remain the same. Some 
think that broader training is needed. Narain 
says that “there needs to be a radical shift” 
in the way PhDs and other graduate courses 
are conducted, and that this should extend 
to medical-school and undergraduate teach- 
ing. He adds, “The years of students focusing 
solely on — and learning more than anyone 
else about — a particular gene mutation, say, 
are over.” Chittenden agrees: “The PhD is 
going to look very different ten years from 
now. Academic curricula will be broader. The 
next generation needs, first and foremost, the 
understanding of human biology, but coupled 
with computer science, computational statis- 
tics and statistical machine learning” 

Others think it is more a case of picking up 
the basics without diverting attention from core 
areas of expertise. “Undergraduates in biology 
need to move towards basic competency in 
statistics and computational ideas,” says Russ 
Altman, a biomedical AI researcher at Stan- 
ford University in California. “But at PhD level, 
people need to acquire deep, technical skills. 
They will be paid for depth, not breadth” 

In 2003, Altman co-launched an under- 
graduate degree in biomedical computation 
for students who want to delve deeply into 
both disciplines. It was relaunched within his 
institution’s bioengineering department in 
March. “I think that at Stanford we're getting 
an early look at what is going to be happening 
at campuses worldwide,” he says. 

There is little consensus about how, even 
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HERE COME THE ROBOTS 


Anew type of scientist 


When the time comes for the history of 
artificial intelligence (Al) to be written, the 
algorithm that gets the job is likely to flag 
12 June 2007 as worthy of note. That was 
the day that a robot called Adam ended 
humanity’s monopoly on the discovery of 
scientific knowledge — by identifying the 
function of a yeast gene. 

By searching public databases, Adam 
generated hypotheses about which genes 
code for key enzymes that catalyse reactions 
in the yeast Saccharomyces cerevisiae, and 
used robotics to physically test its predictions 
in a lab. Researchers at the UK universities 
of Aberystwyth and Cambridge then 
independently tested Adam’s hypotheses 
about the functions of 19 genes; 9 were new 
and accurate, and only 1 was wrong. 

“Robot scientists using Al can test more 
compounds, and do so with improved 
accuracy and reproducibility, and exhaustive, 
searchable record-keeping,” says systems 
biologist Steve Oliver of the University of 
Cambridge, a member of the group that 
developed Adam. 


just a decade from now, AI will affect the skills 
needed to discover the therapies of the future. 
“Being able to code will be useful for at least 
the next 5-10 years, but my suspicion is that, 
beyond that, computers will largely do it for 
us,” says computational medicinal chemist 
Anthony Bradley at the University of Oxford, 
UK. “Tn the lab, we might need a more highly 
trained, specialized workforce working with 
the automation and AI experts to fine-tune 
processes in particular reaction areas,’ he 
says. Or, he adds, it might be that wet-lab skills 
(those needed to perform practical chemical or 
biological experiments) “might be no use ten 
years from now”. 

Bradley uses the Diamond Light Source syn- 
chrotron near Oxford to screen compounds for 
small chemical fragments that bind to molecu- 
lar targets, even if only weakly, with the aim of 
improving their binding strength to produce 
new therapies. He is amember ofa group that is 
using artificial neural networks — an approach 
to training algorithms inspired by the way our 
brains process information — as part ofa struc- 
ture-based drug-design project with the Oxford 
Protein Informatics Group. The aim is to use 
publicly available data on the structural and 
chemical activity of small molecules to teach 
their system to identify those that will act on 
protein drug targets. 

What can those hoping to work in drug 
discovery do to prepare themselves for this 
rapidly evolving environment? Taking steps 
to become informed and flexible are impor- 
tant, say those at the cutting edge of the field. 


In January, the same team announced 
that Adam’s more advanced robot 
colleague, Eve, had discovered that triclosan, 
a common ingredient in toothpaste, could 
potentially treat drug-resistant malaria 
parasites. The researchers developed 
strains of yeast in which genes essential 
for growth had been replaced with their 
equivalents either from malaria parasites or 
from humans. Eve then screened thousands 
of compounds to find those that halted or 
severely slowed the growth of the strains 
dependent on the malaria genes but not 
those containing the human genes — to 
target the parasites while reducing the risk 
of toxicity. Early results were used to inform 
the selection of later candidates to screen. 

This identified triclosan as affecting 
malaria-parasite growth by inhibiting the 
DHFR enzyme — also the target of the 
antimalarial drug pyrimethamine. However, 
resistance to pyrimethamine is common. 
The researchers showed that triclosan 
could act on DHFR even in pyrimethamine- 
resistant parasites. N.F. 


“My training gave me the groundwork so that I 
knew roughly where the field was, but to some 
extent it’s down to students themselves to see 
the way technological trends are going,’ says 
Bradley. “Only by remaining versatile can you 
make the best use of the power of the available 
tools.” He advises those seeking to enter the 
drug-discovery field to keep track of develop- 
ments in AI by monitoring the latest articles in 
leading journals and technology-focused news 
sources and blogs. 

Self-driven learning is especially important, 
Bradley says, because there are limits to how 
well universities can provide the skills that stu- 
dents need to be ready for the future role of Al 
in research. “Almost by definition,” he says, “no 
one can really know what those skills will be” 

Some of the more extravagant predictions 
being made about the ability of AI to revolu- 
tionize drug discovery might well turn out to 
be overblown. Critics point out that there are 
commercial interests at play, and that, as yet, 
there are no approved Al-developed drugs. 
Narain, who thinks the technology will drive 
major advances, agrees that overblown claims 
are being made, but says it won't be long before 
these are exposed for what they are. “The hype 
cant last very long because over the next five 
years or so, the truth will come out in the data,’ 
he says. “If by then we are creating better drugs, 
and doing it faster and cheaper, then AI will 
really take off? = 


Nic Fleming is a freelance science writer 
based in Bristol, UK. 
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